SlideShare une entreprise Scribd logo
1  sur  39
Magdalena Balazinska
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
ESCIENCE INSTITUTE
UNIVERSITY OF WASHINGTON
http://www.cs.washington.edu/people/faculty/magda
Data Pricing and Data License
Agreements in the Cloud
1
Acknowledgments
Research team on project
• Prasang Upadhyaya (UW, lead on licensing)
• Paraschos Koutris (UW, lead on pricing)
• Prof. Dan Suciu (UW)
• Dr. Hakan Hacigumus (NEC Labs)
Sponsors
• National Science Foundation
• NEC
• Microsoft Research
Magdalena Balazinska - University of Washington 2
Data Has Value
• First wave of computing: value in hardware
– IBM, Intel, DEC
• Second wave of computing: value in software
– Microsoft, Oracle, Google
• Third wave of computing: value in data
– Dun & Bradstreet, Factual, Facebook, Google
Magdalena Balazinska - University of Washington 3
Data itself is now a product that is being
created, improved, bought and sold on the Web
Magdalena Balazinska - University of Washington 4
What are the Technical Challenges
• Challenge 1: Data License Agreements
– All data comes with terms of use
– Can we automate their enforcement?
• Challenge 2: Data Pricing
– Existing pricing methods are limited
– Can we support flexible pricing?
Magdalena Balazinska - University of Washington 5
Data Comes with Terms of Use
Overlaying map data
with any other data is
prohibited (Navteq)
Each book may be lent
once for 2 weeks while
being inaccessible by
the lender (Kindle)
Name Ailment Birth
date
Se
x
Locati
on
John Doe Asthma Jan 7th 1972 M Seattle
Mary
Jane
Dislocated
shoulder
Mar 21st
1965
F San Diego
Alice
Summer
Flu May 28th
1986
M San
Francisco
… … … … …
Bob B Flu Oct 14th
2000
M Miami
Medical Data Maps Digital Books
Queries that try to identify
an individual referenced in
the database are
prohibited (MIMIC II)
6
More Examples
Terms of use Source
Overlaying Navteq data with any other data is prohibited Navteq
Each book may be lent once for a duration of 14 days and will not be
readable by the lender during the loan period
Amazon Kindle
In a month, all queries may, in total, return up to 2M characters of
data at the free tier
Microsoft Translator
OAuth calls are permitted 350 requests per hour Twitter and
Foursquare
Queries that try to identify an individual referenced in the database
are prohibited
MIMIC II
You are required to display all attribution information and any
proprietary notices associated with the Foursquare Data
Foursquare, Yelp,
World Bank
Don’t aggregate or blend our star ratings and review counts with other
providers. You may show content from multiple providers, but Yelp
data should stand on its own …
Yelp
7
Fine-grained Access
Look up specific patient
Augmenting Data Sources
Join medical data with
voters registry
Terms of Use Control Use, Not Access
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Histogram
0
20000
40000
60000
80000
100000
120000
140000
0 5 10 15 20 25
Linear Regression
Data
Permitted
Denied
Goal
Enforce policies
to constrain
how data is used
8
Example for medical research data
Today: Written Agreements
Magdalena Balazinska - University of Washington 9
…
Average length: Over 8 pages!
Or Detailed Courses
Magdalena Balazinska - University of Washington 10
Problem with License Agreements
• Burden users with compliance
• Assume that users will comply
Magdalena Balazinska - University of Washington 11
Managing Data License
Agreements with DataLawyer
Trust but verify
Data
D
A
T
A
L
A
W
Y
E
R
Honest but curious
Honest but careless
Malicious
13
Approach Overview
• Data seller defines policies
• Data and policies are loaded into a DataLawyer-
enabled database system
• Buyer queries the data
• DataLawyer checks all queries before execution
Magdalena Balazinska - University of Washington 14
Challenge: Semantics
Example Policy 1: Can access up to
10K records/month.
If the buyer computes a histogram
on the data and filters out some
buckets, did he use the input tuples
from the filtered bucket or not?
15
Challenge: Performance
Example Policy 2: Only allow
aggregate queries where each
output tuple must aggregate over
at least 10 values.
Policies are expensive to check online!
Cheap
Expensive
16
DataLawyer Setup
17
Data
Metadata on
Usage
Policies
Features of
user and query
behavior
Data Usage
Arbitrary code
Shared across
multiple policies
SELECT DISTINCT ‘P5 violated:
Fewer than 10 patients contribute
to an answer’ AS errorMessage
FROM Provenance p
WHERE p.irid = ‘patients’
GROUP BY p.qid, p.otid
HAVING COUNT(DISTINCT p.itid) < 10
SELECT DISTINCT ‘P5 violated:
Fewer than 10 patients contribute
to an answer’ AS errorMessage
FROM Provenance p
WHERE p.irid = ‘patients’
GROUP BY p.qid, p.otid
HAVING COUNT(DISTINCT p.itid) < 10
SELECT DISTINCT ‘P5 violated:
Fewer than 10 patients contribute
to an answer’ AS errorMessage
FROM Provenance p
WHERE p.irid = ‘patients’
GROUP BY p.qid, p.otid
HAVING COUNT(DISTINCT p.itid) < 10
SELECT DISTINCT ‘P5 violated:
Fewer than 10 patients contribute
to an answer’ AS errorMessage
FROM Provenance p
WHERE p.irid = ‘patients’
GROUP BY p.qid, p.otid
HAVING COUNT(DISTINCT p.itid) < 10
Declarative policies
(DataLawyer uses SQL)
Usage Log
18
They capture features of a query that are used in the policies
We require them to:
1. Be deterministic
2. Be append only
3. Contain a timestamp with each tuple
Examples are:
1. Provenance
2. User log
3. Static analysis of the query
4. Pricing
5. …
DataLawyer: Operation
Data
D
A
T
A
L
A
W
Y
E
R
19
f1(Q, D)
f2(Q, D)
…
fm(Q, D)
P1
P2
…
Pn
Query Q
Execute query Q(D)
Populate usage log
Execute policy queries
DataLawyer Workflow Example
Magdalena Balazinska - University of Washington 20
pid disease treatment outcome
1 asthma albuterol positive
…
Data: Patients
Query: What fraction of asthma patients were treated with albuterol?
DataLawyer: Populates the usage logs
uid query table column
1 1 Patient treatment
1 1 Patient outcome
DataLawyer checks policies, which are queries over the usage logs
• Queries are not allowed to access column pid
• Queries must aggregate data from at least 10 rows in Patients
Example Using SQL
21
Policy: Stop queries where fewer than 10 patients contribute to any output tuple.
SELECT DISTINCT ‘P5 violated: Fewer than 10 patients contribute to an answer’
AS errorMessage
FROM Provenance p
WHERE p.irid = ‘patients’
GROUP BY p.qid, p.otid
HAVING COUNT(DISTINCT p.itid) < 10
Usage log that captures how each tuple in query result
was derived from records on disk
Provenance(ts, // Timestamp
qid, // Query id
otid, // Output tuple id, a hash of the output tuple
irid, // Input relation id, usually the name
itid // Input tuple id, usually the name
)
If false, no violation
If true, at least
one example of a violation
Policy refers to the provenance usage log
Need for Optimizations
22
0
50
100
150
200
250
300
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Averagetimeforthebatch(inms)
Batch number (Each batch consists of 120 queries)
Baseline, Hard
DataLawyer, Hard
Baseline, Easy
DataLawyer, Easy
Without optimizations,
time to process queries &
check policies grows
Time grows even for simple policies
DataLawyer keeps time constant
DataLawyer keeps time low
Policy Evaluation
• There are three major steps:
– Generate the usage logs
– Evaluate policies
– Write log to disk if everything is okay, else abort
• Our optimizations:
– Avoid generating the logs
– Prune the logs by removing data no longer needed
– Avoid evaluating all policies
– Try to evaluate cheaper, partial policies first
23
DataLawyer Performance Illustration
Magdalena Balazinska - University of Washington 24
Data License Agreements Summary
• Data comes with terms of use
– Even free data often has terms of use
• Today, terms of use are written in natural language
– Compliance for buyers is tedious and error-prone
• Possible to automate the process: DataLawyer
– Enables more precise terms of use specification
– Enables efficient enforcement
• Open problems
– Malicious users
– Data leaving database system
Magdalena Balazinska - University of Washington 25
What are the Technical Challenges
• Challenge 1: Data License Agreements
– All data comes with terms of use
– Can we automate their enforcement?
• Challenge 2: Data Pricing
– Existing pricing methods are limited
– Can we support flexible pricing?
Magdalena Balazinska - University of Washington 26
Data Pricing Today: Fixed
Magdalena Balazinska - University of Washington 27
Not flexible!
Data Pricing Today: Subscriptions
Magdalena Balazinska - University of Washington 28
Not flexible!
Data Pricing Today: Private Price
Magdalena Balazinska - University of Washington 29
Not scalable!
Example Scenario
• Seller has a database of cities and business contact information
– Businesses in one province or state: $300
– One type of business: $150
– Cities with given climate: $10
• Buyer:
– Q1: “Businesses with more than 200 employees” (selection)
– Q2: “West-coast businesses in cities with high yearly
precipitation” (join)
• How to satisfy buyer?
Magdalena Balazinska - University of Washington 30
Current Pricing: Fixed Prices
• Fixed price for entire dataset
• Must create and price views specific to queries Q1 and Q2
• OR user must buy entire dataset if view not available
• AND user must perform joins by herself
• Certainly the case if datasets have different owners
31Magdalena Balazinska - University of Washington
Current Pricing: Subscriptions
• Subscriptions
– Fixed number of transactions per month
– Must create and price appropriate parameterized queries
– Today queries are dataset specific (i.e., no joins!)
– Can satisfy Q1: “Businesses with more than 200 employees”
– Cannot Q2: “West-coast businesses in cities with high yearly
precipitation”
32Magdalena Balazinska - University of Washington
Other Data Pricing Issues
• Today’s data pricing can also have bad properties
• Example: Weather Imagery on Azure DataMarket
– 1,000,000 transactions -> $2,400
– 100,000 -> $600
– 10,000 -> $120
– 2,500 -> $0
• Arbitrage opportunity:
– Emulate many users
– Get as much data as you want for free!
33Magdalena Balazinska - University of Washington
Data Pricing with QueryMarket
Query-Based Pricing
• Seller specifies a set of queries Q1, … Qn
• These queries form views on the data for sell
• Seller prices the views: price(Q1), …, price(Qn)
– D = all cities and businesses in North America
– V1 (businesses in one state) = $300
– V2 (businesses of one type) = $150
– V3 (cities with a given climate) = $10
35Magdalena Balazinska - University of Washington
Query-Based Pricing
• QueryMarket system computes other query prices
– Q2: “West-coast businesses in cities with high yearly
precipitation”
– Key idea: Compute least expensive set of views that can
be used to answer the query. The sum of the price of
these views is the price of the query
• System guarantees price properties
– Arbitrage-free prices
– Maximal prices (no unintended discounts)
36Magdalena Balazinska - University of Washington
Conclusion
• Data has value
• Data is bought and sold online
• Supporting modern data markets requires
– New tools for managing license agreements
– New methods for pricing data
• Much work remains to be done
http://cloud-data-pricing.cs.washington.edu
Magdalena Balazinska - University of Washington 37
Magdalena Balazinska - University of Washington 38
Potential Techniques
Privacy
Mechanisms
(Reduces data utility)
Intrusion
Detection System
(Assumes the user is
malicious. Offline)
Access Control
(Want full access to
data)
Auditing Systems
(Offline)
Online Offline
Fuzzy
Semantics
Precise
Semantics
None of these work!
39

Contenu connexe

Tendances

Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Ringgold Inc
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Krishnaram Kenthapadi
 
Big data in Healthcare & Life Sciences
Big data in Healthcare & Life SciencesBig data in Healthcare & Life Sciences
Big data in Healthcare & Life SciencesMatthias Vallaey
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsPaul Grant
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationSaama
 
Microsoft SQL Server always on solutions guide for high availability and dis...
Microsoft  SQL Server always on solutions guide for high availability and dis...Microsoft  SQL Server always on solutions guide for high availability and dis...
Microsoft SQL Server always on solutions guide for high availability and dis...Компания Робот Икс
 
Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Nitin Verma
 
Enabling Better Clinical Operations through a Clinical Operations Store
Enabling Better Clinical Operations through a Clinical Operations StoreEnabling Better Clinical Operations through a Clinical Operations Store
Enabling Better Clinical Operations through a Clinical Operations StoreSaama
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataFindWhitePapers
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
 
Building a Next Generation Clinical and Scientific Data Management Solution
Building a Next Generation Clinical and Scientific Data Management SolutionBuilding a Next Generation Clinical and Scientific Data Management Solution
Building a Next Generation Clinical and Scientific Data Management SolutionSaama
 
Data Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingData Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingDr. Volkan OBAN
 

Tendances (17)

Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Big data in Healthcare & Life Sciences
Big data in Healthcare & Life SciencesBig data in Healthcare & Life Sciences
Big data in Healthcare & Life Sciences
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutions
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
 
Sub1555
Sub1555Sub1555
Sub1555
 
Microsoft SQL Server always on solutions guide for high availability and dis...
Microsoft  SQL Server always on solutions guide for high availability and dis...Microsoft  SQL Server always on solutions guide for high availability and dis...
Microsoft SQL Server always on solutions guide for high availability and dis...
 
Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics
 
Enabling Better Clinical Operations through a Clinical Operations Store
Enabling Better Clinical Operations through a Clinical Operations StoreEnabling Better Clinical Operations through a Clinical Operations Store
Enabling Better Clinical Operations through a Clinical Operations Store
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
Trial io pcori doc v1
Trial io pcori doc v1Trial io pcori doc v1
Trial io pcori doc v1
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
Building a Next Generation Clinical and Scientific Data Management Solution
Building a Next Generation Clinical and Scientific Data Management SolutionBuilding a Next Generation Clinical and Scientific Data Management Solution
Building a Next Generation Clinical and Scientific Data Management Solution
 
Data Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingData Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision making
 

En vedette

My credentials for linked in Dec 2015
My credentials for linked in Dec 2015My credentials for linked in Dec 2015
My credentials for linked in Dec 2015Leanne Wellings
 
What is the best platform for managing social media activity ?
What is the best platform for managing social media activity ?What is the best platform for managing social media activity ?
What is the best platform for managing social media activity ?Ludovic Martin
 
Sergio Zamorano Ingeniería
Sergio Zamorano IngenieríaSergio Zamorano Ingeniería
Sergio Zamorano IngenieríaSergio Zamorano
 
đề thi tin học văn phòng B
đề thi tin học văn phòng Bđề thi tin học văn phòng B
đề thi tin học văn phòng BThanhphong95
 
2016 Landscape Architecture Portfolio: Zachary B.L. Rees
2016 Landscape Architecture Portfolio: Zachary B.L. Rees2016 Landscape Architecture Portfolio: Zachary B.L. Rees
2016 Landscape Architecture Portfolio: Zachary B.L. ReesZachary B. Rees
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์ โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์ nareudol niramarn
 
アルカナに入社しました。
アルカナに入社しました。アルカナに入社しました。
アルカナに入社しました。Yukihiro Katsumi
 

En vedette (13)

Arif letest cv
Arif letest cvArif letest cv
Arif letest cv
 
Compte rendu journée icar
Compte rendu journée icarCompte rendu journée icar
Compte rendu journée icar
 
ateliers D (french)
 ateliers D (french) ateliers D (french)
ateliers D (french)
 
My credentials for linked in Dec 2015
My credentials for linked in Dec 2015My credentials for linked in Dec 2015
My credentials for linked in Dec 2015
 
What is the best platform for managing social media activity ?
What is the best platform for managing social media activity ?What is the best platform for managing social media activity ?
What is the best platform for managing social media activity ?
 
Projet tice-21-janv
Projet tice-21-janvProjet tice-21-janv
Projet tice-21-janv
 
Sergio Zamorano Ingeniería
Sergio Zamorano IngenieríaSergio Zamorano Ingeniería
Sergio Zamorano Ingeniería
 
đề thi tin học văn phòng B
đề thi tin học văn phòng Bđề thi tin học văn phòng B
đề thi tin học văn phòng B
 
Em ball By ACR 58
Em ball By ACR 58Em ball By ACR 58
Em ball By ACR 58
 
การสื่อสารข้อมูล
การสื่อสารข้อมูลการสื่อสารข้อมูล
การสื่อสารข้อมูล
 
2016 Landscape Architecture Portfolio: Zachary B.L. Rees
2016 Landscape Architecture Portfolio: Zachary B.L. Rees2016 Landscape Architecture Portfolio: Zachary B.L. Rees
2016 Landscape Architecture Portfolio: Zachary B.L. Rees
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์ โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์
 
アルカナに入社しました。
アルカナに入社しました。アルカナに入社しました。
アルカナに入社しました。
 

Similaire à Magdalena Balazinska: Data pricing and data license agreements

Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data SciencesSaama
 
The Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurityThe Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurityTyrone Grandison
 
Big Data, Big Investment
Big Data, Big InvestmentBig Data, Big Investment
Big Data, Big InvestmentGGV Capital
 
Smarter analytics101 v2.0.1
Smarter analytics101 v2.0.1Smarter analytics101 v2.0.1
Smarter analytics101 v2.0.1Jenawahl
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBigDataExpo
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining Suman Chatterjee
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overviewjkvr
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services MarketplaceDenodo
 
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Saama
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataHealth Catalyst
 
Cloud and business agility
Cloud and business agilityCloud and business agility
Cloud and business agilityMike ORourke
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyMongoDB
 
Data For Dummies
Data For DummiesData For Dummies
Data For DummiesC.A.F.C.A.
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxMrityunjay Emmi
 
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...DATAVERSITY
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataDale Sanders
 

Similaire à Magdalena Balazinska: Data pricing and data license agreements (20)

Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data Sciences
 
The Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurityThe Role of Audit Analysis in CyberSecurity
The Role of Audit Analysis in CyberSecurity
 
Big Data, Big Investment
Big Data, Big InvestmentBig Data, Big Investment
Big Data, Big Investment
 
Smarter analytics101 v2.0.1
Smarter analytics101 v2.0.1Smarter analytics101 v2.0.1
Smarter analytics101 v2.0.1
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
Cloud and business agility
Cloud and business agilityCloud and business agility
Cloud and business agility
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
Data For Dummies
Data For DummiesData For Dummies
Data For Dummies
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
 
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
 
Big Data Forum - Phoenix
Big Data Forum - PhoenixBig Data Forum - Phoenix
Big Data Forum - Phoenix
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
 

Plus de CBOD ANR project U-PSUD

Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...
Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...
Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...CBOD ANR project U-PSUD
 
Lorraine Morgan: Factors affecting the adoption of cloud computing
Lorraine Morgan: Factors affecting the adoption of cloud computingLorraine Morgan: Factors affecting the adoption of cloud computing
Lorraine Morgan: Factors affecting the adoption of cloud computingCBOD ANR project U-PSUD
 
Nessrine Omrani & Serge Pajak: Privacy in the cloud
Nessrine Omrani & Serge Pajak: Privacy in the cloudNessrine Omrani & Serge Pajak: Privacy in the cloud
Nessrine Omrani & Serge Pajak: Privacy in the cloudCBOD ANR project U-PSUD
 
Assaf Schuster: Raas+Ginseng: resource allocation in the cloud
Assaf Schuster: Raas+Ginseng: resource allocation in the cloudAssaf Schuster: Raas+Ginseng: resource allocation in the cloud
Assaf Schuster: Raas+Ginseng: resource allocation in the cloudCBOD ANR project U-PSUD
 
Maurizio Naldi: Cloud computing modelling and adoption
Maurizio Naldi:  Cloud computing modelling and adoptionMaurizio Naldi:  Cloud computing modelling and adoption
Maurizio Naldi: Cloud computing modelling and adoptionCBOD ANR project U-PSUD
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkCBOD ANR project U-PSUD
 
Margherita Pagani: Digital business strategy & value creation
Margherita Pagani: Digital business strategy & value creationMargherita Pagani: Digital business strategy & value creation
Margherita Pagani: Digital business strategy & value creationCBOD ANR project U-PSUD
 
Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...
Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...
Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...CBOD ANR project U-PSUD
 
Marie-Helène Delmond: Business models in traditional versus pure digital indu...
Marie-Helène Delmond: Business models in traditional versus pure digital indu...Marie-Helène Delmond: Business models in traditional versus pure digital indu...
Marie-Helène Delmond: Business models in traditional versus pure digital indu...CBOD ANR project U-PSUD
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computingCBOD ANR project U-PSUD
 
Cinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approachCinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approachCBOD ANR project U-PSUD
 
Pierre jean benghozi: Business models and innovation; some lessons in the cul...
Pierre jean benghozi: Business models and innovation; some lessons in the cul...Pierre jean benghozi: Business models and innovation; some lessons in the cul...
Pierre jean benghozi: Business models and innovation; some lessons in the cul...CBOD ANR project U-PSUD
 
Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...
Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...
Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...CBOD ANR project U-PSUD
 
Cyril Bartolo: European Users’ recommendations for the success of Public Clou...
Cyril Bartolo: European Users’ recommendations for the success of Public Clou...Cyril Bartolo: European Users’ recommendations for the success of Public Clou...
Cyril Bartolo: European Users’ recommendations for the success of Public Clou...CBOD ANR project U-PSUD
 
Omar el sawy: Digital business models in networked abundance
Omar el sawy: Digital business models in networked abundanceOmar el sawy: Digital business models in networked abundance
Omar el sawy: Digital business models in networked abundanceCBOD ANR project U-PSUD
 
Sjaak Brinkkemper: Visual Business Modeling Techniques for the Software Industry
Sjaak Brinkkemper: Visual Business Modeling Techniques for the Software IndustrySjaak Brinkkemper: Visual Business Modeling Techniques for the Software Industry
Sjaak Brinkkemper: Visual Business Modeling Techniques for the Software IndustryCBOD ANR project U-PSUD
 

Plus de CBOD ANR project U-PSUD (16)

Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...
Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...
Adel Ben Youssef: Determinants of the adoption of cloud computing by tunisian...
 
Lorraine Morgan: Factors affecting the adoption of cloud computing
Lorraine Morgan: Factors affecting the adoption of cloud computingLorraine Morgan: Factors affecting the adoption of cloud computing
Lorraine Morgan: Factors affecting the adoption of cloud computing
 
Nessrine Omrani & Serge Pajak: Privacy in the cloud
Nessrine Omrani & Serge Pajak: Privacy in the cloudNessrine Omrani & Serge Pajak: Privacy in the cloud
Nessrine Omrani & Serge Pajak: Privacy in the cloud
 
Assaf Schuster: Raas+Ginseng: resource allocation in the cloud
Assaf Schuster: Raas+Ginseng: resource allocation in the cloudAssaf Schuster: Raas+Ginseng: resource allocation in the cloud
Assaf Schuster: Raas+Ginseng: resource allocation in the cloud
 
Maurizio Naldi: Cloud computing modelling and adoption
Maurizio Naldi:  Cloud computing modelling and adoptionMaurizio Naldi:  Cloud computing modelling and adoption
Maurizio Naldi: Cloud computing modelling and adoption
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business framework
 
Margherita Pagani: Digital business strategy & value creation
Margherita Pagani: Digital business strategy & value creationMargherita Pagani: Digital business strategy & value creation
Margherita Pagani: Digital business strategy & value creation
 
Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...
Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...
Sebatien Tran: Le SI et ses utilisatueurs…perspectives sur la stratégie; it d...
 
Marie-Helène Delmond: Business models in traditional versus pure digital indu...
Marie-Helène Delmond: Business models in traditional versus pure digital indu...Marie-Helène Delmond: Business models in traditional versus pure digital indu...
Marie-Helène Delmond: Business models in traditional versus pure digital indu...
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
 
Cinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approachCinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approach
 
Pierre jean benghozi: Business models and innovation; some lessons in the cul...
Pierre jean benghozi: Business models and innovation; some lessons in the cul...Pierre jean benghozi: Business models and innovation; some lessons in the cul...
Pierre jean benghozi: Business models and innovation; some lessons in the cul...
 
Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...
Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...
Xunhua Guo: Evolution of e-ordering in the chinese drug distribution industry...
 
Cyril Bartolo: European Users’ recommendations for the success of Public Clou...
Cyril Bartolo: European Users’ recommendations for the success of Public Clou...Cyril Bartolo: European Users’ recommendations for the success of Public Clou...
Cyril Bartolo: European Users’ recommendations for the success of Public Clou...
 
Omar el sawy: Digital business models in networked abundance
Omar el sawy: Digital business models in networked abundanceOmar el sawy: Digital business models in networked abundance
Omar el sawy: Digital business models in networked abundance
 
Sjaak Brinkkemper: Visual Business Modeling Techniques for the Software Industry
Sjaak Brinkkemper: Visual Business Modeling Techniques for the Software IndustrySjaak Brinkkemper: Visual Business Modeling Techniques for the Software Industry
Sjaak Brinkkemper: Visual Business Modeling Techniques for the Software Industry
 

Dernier

(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)twfkn8xj
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfMichael Silva
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasCherylouCamus
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Sonam Pathan
 
Kempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdfKempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdfHenry Tapper
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economiccinemoviesu
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...Amil baba
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Devarsh Vakil
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...amilabibi1
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojnaDharmendra Kumar
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Champak Jhagmag
 

Dernier (20)

(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdf
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng Pilipinas
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results Presentation
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713
 
Kempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdfKempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdf
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economic
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojna
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024
 

Magdalena Balazinska: Data pricing and data license agreements

  • 1. Magdalena Balazinska DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ESCIENCE INSTITUTE UNIVERSITY OF WASHINGTON http://www.cs.washington.edu/people/faculty/magda Data Pricing and Data License Agreements in the Cloud 1
  • 2. Acknowledgments Research team on project • Prasang Upadhyaya (UW, lead on licensing) • Paraschos Koutris (UW, lead on pricing) • Prof. Dan Suciu (UW) • Dr. Hakan Hacigumus (NEC Labs) Sponsors • National Science Foundation • NEC • Microsoft Research Magdalena Balazinska - University of Washington 2
  • 3. Data Has Value • First wave of computing: value in hardware – IBM, Intel, DEC • Second wave of computing: value in software – Microsoft, Oracle, Google • Third wave of computing: value in data – Dun & Bradstreet, Factual, Facebook, Google Magdalena Balazinska - University of Washington 3
  • 4. Data itself is now a product that is being created, improved, bought and sold on the Web Magdalena Balazinska - University of Washington 4
  • 5. What are the Technical Challenges • Challenge 1: Data License Agreements – All data comes with terms of use – Can we automate their enforcement? • Challenge 2: Data Pricing – Existing pricing methods are limited – Can we support flexible pricing? Magdalena Balazinska - University of Washington 5
  • 6. Data Comes with Terms of Use Overlaying map data with any other data is prohibited (Navteq) Each book may be lent once for 2 weeks while being inaccessible by the lender (Kindle) Name Ailment Birth date Se x Locati on John Doe Asthma Jan 7th 1972 M Seattle Mary Jane Dislocated shoulder Mar 21st 1965 F San Diego Alice Summer Flu May 28th 1986 M San Francisco … … … … … Bob B Flu Oct 14th 2000 M Miami Medical Data Maps Digital Books Queries that try to identify an individual referenced in the database are prohibited (MIMIC II) 6
  • 7. More Examples Terms of use Source Overlaying Navteq data with any other data is prohibited Navteq Each book may be lent once for a duration of 14 days and will not be readable by the lender during the loan period Amazon Kindle In a month, all queries may, in total, return up to 2M characters of data at the free tier Microsoft Translator OAuth calls are permitted 350 requests per hour Twitter and Foursquare Queries that try to identify an individual referenced in the database are prohibited MIMIC II You are required to display all attribution information and any proprietary notices associated with the Foursquare Data Foursquare, Yelp, World Bank Don’t aggregate or blend our star ratings and review counts with other providers. You may show content from multiple providers, but Yelp data should stand on its own … Yelp 7
  • 8. Fine-grained Access Look up specific patient Augmenting Data Sources Join medical data with voters registry Terms of Use Control Use, Not Access 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Histogram 0 20000 40000 60000 80000 100000 120000 140000 0 5 10 15 20 25 Linear Regression Data Permitted Denied Goal Enforce policies to constrain how data is used 8 Example for medical research data
  • 9. Today: Written Agreements Magdalena Balazinska - University of Washington 9 … Average length: Over 8 pages!
  • 10. Or Detailed Courses Magdalena Balazinska - University of Washington 10
  • 11. Problem with License Agreements • Burden users with compliance • Assume that users will comply Magdalena Balazinska - University of Washington 11
  • 13. Trust but verify Data D A T A L A W Y E R Honest but curious Honest but careless Malicious 13
  • 14. Approach Overview • Data seller defines policies • Data and policies are loaded into a DataLawyer- enabled database system • Buyer queries the data • DataLawyer checks all queries before execution Magdalena Balazinska - University of Washington 14
  • 15. Challenge: Semantics Example Policy 1: Can access up to 10K records/month. If the buyer computes a histogram on the data and filters out some buckets, did he use the input tuples from the filtered bucket or not? 15
  • 16. Challenge: Performance Example Policy 2: Only allow aggregate queries where each output tuple must aggregate over at least 10 values. Policies are expensive to check online! Cheap Expensive 16
  • 17. DataLawyer Setup 17 Data Metadata on Usage Policies Features of user and query behavior Data Usage Arbitrary code Shared across multiple policies SELECT DISTINCT ‘P5 violated: Fewer than 10 patients contribute to an answer’ AS errorMessage FROM Provenance p WHERE p.irid = ‘patients’ GROUP BY p.qid, p.otid HAVING COUNT(DISTINCT p.itid) < 10 SELECT DISTINCT ‘P5 violated: Fewer than 10 patients contribute to an answer’ AS errorMessage FROM Provenance p WHERE p.irid = ‘patients’ GROUP BY p.qid, p.otid HAVING COUNT(DISTINCT p.itid) < 10 SELECT DISTINCT ‘P5 violated: Fewer than 10 patients contribute to an answer’ AS errorMessage FROM Provenance p WHERE p.irid = ‘patients’ GROUP BY p.qid, p.otid HAVING COUNT(DISTINCT p.itid) < 10 SELECT DISTINCT ‘P5 violated: Fewer than 10 patients contribute to an answer’ AS errorMessage FROM Provenance p WHERE p.irid = ‘patients’ GROUP BY p.qid, p.otid HAVING COUNT(DISTINCT p.itid) < 10 Declarative policies (DataLawyer uses SQL)
  • 18. Usage Log 18 They capture features of a query that are used in the policies We require them to: 1. Be deterministic 2. Be append only 3. Contain a timestamp with each tuple Examples are: 1. Provenance 2. User log 3. Static analysis of the query 4. Pricing 5. …
  • 19. DataLawyer: Operation Data D A T A L A W Y E R 19 f1(Q, D) f2(Q, D) … fm(Q, D) P1 P2 … Pn Query Q Execute query Q(D) Populate usage log Execute policy queries
  • 20. DataLawyer Workflow Example Magdalena Balazinska - University of Washington 20 pid disease treatment outcome 1 asthma albuterol positive … Data: Patients Query: What fraction of asthma patients were treated with albuterol? DataLawyer: Populates the usage logs uid query table column 1 1 Patient treatment 1 1 Patient outcome DataLawyer checks policies, which are queries over the usage logs • Queries are not allowed to access column pid • Queries must aggregate data from at least 10 rows in Patients
  • 21. Example Using SQL 21 Policy: Stop queries where fewer than 10 patients contribute to any output tuple. SELECT DISTINCT ‘P5 violated: Fewer than 10 patients contribute to an answer’ AS errorMessage FROM Provenance p WHERE p.irid = ‘patients’ GROUP BY p.qid, p.otid HAVING COUNT(DISTINCT p.itid) < 10 Usage log that captures how each tuple in query result was derived from records on disk Provenance(ts, // Timestamp qid, // Query id otid, // Output tuple id, a hash of the output tuple irid, // Input relation id, usually the name itid // Input tuple id, usually the name ) If false, no violation If true, at least one example of a violation Policy refers to the provenance usage log
  • 22. Need for Optimizations 22 0 50 100 150 200 250 300 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Averagetimeforthebatch(inms) Batch number (Each batch consists of 120 queries) Baseline, Hard DataLawyer, Hard Baseline, Easy DataLawyer, Easy Without optimizations, time to process queries & check policies grows Time grows even for simple policies DataLawyer keeps time constant DataLawyer keeps time low
  • 23. Policy Evaluation • There are three major steps: – Generate the usage logs – Evaluate policies – Write log to disk if everything is okay, else abort • Our optimizations: – Avoid generating the logs – Prune the logs by removing data no longer needed – Avoid evaluating all policies – Try to evaluate cheaper, partial policies first 23
  • 24. DataLawyer Performance Illustration Magdalena Balazinska - University of Washington 24
  • 25. Data License Agreements Summary • Data comes with terms of use – Even free data often has terms of use • Today, terms of use are written in natural language – Compliance for buyers is tedious and error-prone • Possible to automate the process: DataLawyer – Enables more precise terms of use specification – Enables efficient enforcement • Open problems – Malicious users – Data leaving database system Magdalena Balazinska - University of Washington 25
  • 26. What are the Technical Challenges • Challenge 1: Data License Agreements – All data comes with terms of use – Can we automate their enforcement? • Challenge 2: Data Pricing – Existing pricing methods are limited – Can we support flexible pricing? Magdalena Balazinska - University of Washington 26
  • 27. Data Pricing Today: Fixed Magdalena Balazinska - University of Washington 27 Not flexible!
  • 28. Data Pricing Today: Subscriptions Magdalena Balazinska - University of Washington 28 Not flexible!
  • 29. Data Pricing Today: Private Price Magdalena Balazinska - University of Washington 29 Not scalable!
  • 30. Example Scenario • Seller has a database of cities and business contact information – Businesses in one province or state: $300 – One type of business: $150 – Cities with given climate: $10 • Buyer: – Q1: “Businesses with more than 200 employees” (selection) – Q2: “West-coast businesses in cities with high yearly precipitation” (join) • How to satisfy buyer? Magdalena Balazinska - University of Washington 30
  • 31. Current Pricing: Fixed Prices • Fixed price for entire dataset • Must create and price views specific to queries Q1 and Q2 • OR user must buy entire dataset if view not available • AND user must perform joins by herself • Certainly the case if datasets have different owners 31Magdalena Balazinska - University of Washington
  • 32. Current Pricing: Subscriptions • Subscriptions – Fixed number of transactions per month – Must create and price appropriate parameterized queries – Today queries are dataset specific (i.e., no joins!) – Can satisfy Q1: “Businesses with more than 200 employees” – Cannot Q2: “West-coast businesses in cities with high yearly precipitation” 32Magdalena Balazinska - University of Washington
  • 33. Other Data Pricing Issues • Today’s data pricing can also have bad properties • Example: Weather Imagery on Azure DataMarket – 1,000,000 transactions -> $2,400 – 100,000 -> $600 – 10,000 -> $120 – 2,500 -> $0 • Arbitrage opportunity: – Emulate many users – Get as much data as you want for free! 33Magdalena Balazinska - University of Washington
  • 34. Data Pricing with QueryMarket
  • 35. Query-Based Pricing • Seller specifies a set of queries Q1, … Qn • These queries form views on the data for sell • Seller prices the views: price(Q1), …, price(Qn) – D = all cities and businesses in North America – V1 (businesses in one state) = $300 – V2 (businesses of one type) = $150 – V3 (cities with a given climate) = $10 35Magdalena Balazinska - University of Washington
  • 36. Query-Based Pricing • QueryMarket system computes other query prices – Q2: “West-coast businesses in cities with high yearly precipitation” – Key idea: Compute least expensive set of views that can be used to answer the query. The sum of the price of these views is the price of the query • System guarantees price properties – Arbitrage-free prices – Maximal prices (no unintended discounts) 36Magdalena Balazinska - University of Washington
  • 37. Conclusion • Data has value • Data is bought and sold online • Supporting modern data markets requires – New tools for managing license agreements – New methods for pricing data • Much work remains to be done http://cloud-data-pricing.cs.washington.edu Magdalena Balazinska - University of Washington 37
  • 38. Magdalena Balazinska - University of Washington 38
  • 39. Potential Techniques Privacy Mechanisms (Reduces data utility) Intrusion Detection System (Assumes the user is malicious. Offline) Access Control (Want full access to data) Auditing Systems (Offline) Online Offline Fuzzy Semantics Precise Semantics None of these work! 39

Notes de l'éditeur

  1. Once hardware become commoditized, the value moved to the software… Dun & Bradstreet: To be the most trusted source of commercial insight so our customers can decide with confidence. With over 225 million records on businesses from around the world, we have more data and insight about businesses than any other information company on the planet. Founded 1841. How they sell their data: It always says: “To get started, contact us”. Factual: Business info, service such as translating lat/long into a neighborhood, data integration APIs (such as mapping location on different services such as Yelp, Foursquare, Facebook, and Twitter. Companies that own data, aggregate & curate data, sell data-related services such as cleaning, integration, and analysis.
  2. Microsoft Azure Marketplace: online market for buying and selling finished Software as a Service (SaaS) applications and premium datasets. 131 free datasets. 95 paid datasets (26 have free trial). Prices are posted. Subscription based. Factual: Focus is on supporting mobile apps. Business info, service such as translating lat/long into a neighborhood, data integration APIs (such as mapping location on different services such as Yelp, Foursquare, Facebook, and Twitter. Private pricing (just like Dun & Bradstreet). AggData: Business contact information. Flexible pricing full datasets, subscriptions to get updates, or personalized data product. Gnip: Provider of social data (real time and historical). For example, real-time “firehose” data. We get every activity from social networks like Twitter, Tumblr, and WordPress, add in our own enrichments and then pass the data on to you, all in the blink of an eye. Also sell subsets of data based on user-specified selection predicates. Private pricing. Patientslikeme: See 25 million data points about disease. We take the information patients like you share about your experience with the disease and sell it to our partners. Xignite: Sell financial market data. Private pricing appears to be subscription based (hits).
  3. The MIMIC-II research database (Multiparameter Intelligent Monitoring in Intensive Care) is notable for three factors: it is publicly and freely available; it encompasses a diverse and very large population of ICU patients; and it contains high temporal resolution data including lab results, electronic documentation, and bedside monitor trends and waveforms. The database can support a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. What medical researchers actually do is this. You email them saying you want the data. They send you the link to this three hour long course. If you are like me, and use common sense, you flunk it and spend an additional hour retaiking it. Now, all the course does is to tell you what you can’t do with the data which is, you can’t share it with anyone else and you shouldn’t try to infer the identity of patients.
  4. Here are more examples. The matter of interest to us is that these license agreements were over 8 pages of text and impose significant cognitive burden on the users of these datasets. You can do live processing on twitter through Twitter Stream API. But stream api doesn't send you all tweets, you have to specify either keywords to filter tweets containing given keywords or userIds to follow tweets coming from given users or geo location to filter tweets coming from a particular place. Also there are limits on number of keywords, user ids to follow etc. And lastly twitter will give you upmost 1% percent of the tweets in the global traffic. So for example if you come up with a filter which tracks less than 1% of the global trafic you will get all the tweets (in theory) otherwise you will miss some tweet as twitter won't share them with you.
  5. We performed a survey of 13 data providers (Foursquare [8], Yelp [20], Azure Marketplace [17], Twitter [14], In- fochimps [9], Socrata [15], Xignite [19], Digital Folio [6], DataSift [5], World Bank [18], Navteq [12], data.gov.uk [13], and DataMarket [4].
  6. I will give the overall idea. Please find me during the poster session to discuss the algorithms in depth. To check policies given a user query, we first run the various feature extractors, populate them into a log, that we call the usage log, evaluate all the policies and check that all of them are empty. If so, we return the query answers, if not, we give an error message for the policy that failed. We have a bunch of optimizations that try to evaluate policies without evaluating all of the features, since they are the primary overheads.
  7. To check policies given a user query, we first run the various feature extractors, populate them into a log, that we call the usage log, evaluate all the policies and check that all of them are empty. If so, we return the query answers, if not, we give an error message for the policy that failed. We have a bunch of optimizations that try to evaluate policies without evaluating all of the features, since they are the primary overheads. We found that this abstraction is powerful enough to capture a vast number of usage terms we found in real life. Our prototype extracts three features, user information such as user-id and query time. schema information about the input relations of the query and its output columns. where-provenance of tuples. Given just these three tables, it is possible to write very sophisticated policy checking queries. Note that it is also easy to extend the system to newer domains, say query prices in data marketplaces. No additional SQL constructs, and extensible.
  8. Scaleup
  9. The seller needs to carefully design the views to sell so as to cater to as many buyers as she can, without knowing the exact queries from the buyers. Similarly, the buyer needs to carefully evaluate the various views that are available for sale to determine which views suce to answer her query and are also the cheapest way to answer that query
  10. Currently supports only selection queries on a single attribute of a relation
  11. Currently, non-trivial subset of full conjunctive queries without self-joins [8] called the Generalized Chain Queries" (Section 5), a set that includes all path join queries, star join queries, and their combination. an algorithm with polynomial time data complexity for computing the price of any generalized chain query, by reducing the problem to network flow, (2) a complete characterization of the class of Conjunctive Queries without self-joins that can be priced with PTIME data complexity (this class is slightly larger than generalize chain queries), and (3) a proof that pricing all other queries is NP-complete, thus establishing a dichotomy on the complexity of the pricing problem when all views are selection queries. For the queries that are not in PTIME, we reduce the problem to an Integer Linear Pro- gram and compute the prices.
  12. IDS: The user is not adversarial, but a legal user. Highlight about why each doesn’t work. Privacy: We discussed. Access control: Do not want it. Auditing systems: After the fact. Intrusion detection system: Very different aims. Aims to find malicious users whose actions are deliberate. Our users aren’t malicious and offenses. Access control Offline audits Hippocratic databases ----- Meeting Notes (1/4/13 16:47) ----- Instrsion detection argument was bad.