The road to open data enlightenment is paved with nice excuses! These slides include 11 open data revenue models for government agencies who 'pragmatically' need to keep generating revenues being 'authentic sources'. This presentation was delivered by Toon Vanagt from https://data.be as the opening keynote of the 'opening-up' conference in Brussels on 3/12/2014.
3. Some data.be features
Autocomplete
Mashing up gov
sources
Data
enrichment
Financial ratios
OCR in PDFs
Entity
recognition
Alerts
4. Police Force are top users of
data.be
On the
internet you
must always
remember:
If something
of value is
free, you’re
the product!
5. Definitions
‘Open knowledge’ is any
content, information or
data that people are
free to use, re-use and
redistribute — without
any legal, technological
or social restriction.
(okfn.org)
‘Open data’ and ‘open
content’
mean anyone can freely
access, use, modify, and
share for any purpose —
subject, at most, to
requirements that preserve
provenance and openness.
(opendefinition.org)
6. Open Data Enlightenment vs
Buzz
The Age of Enlightenment is the era from the1650s to the
1780s in which cultural and intellectual forces emphasized
reason, analysis and individualism rather than traditional lines
of authority….
The current open data philosophy redefines ‘authority’ too and
appeals to analytical power of citizens, hackers, journalists and
entrepreneurs to put data to good use.
Open data:
fosters “bottom up”-approach
stimulates to get more out of the data sets
delivers unexpected results & insights
Beware of fancy alchemy headlines:
Open Data Is The New Oil
Unlocking The Gold Mine
Turning Government Data Into Gold
€40 Billion boost to the EU's economy each
year…
7. Excuse 1: But how will we make
money?
Does your government
(department) really have to make
money with open data?
Open data quickly evolved into
primary state infrastructure &
service.
Open data benefits society as a
whole, so why tax usage
separately?
If you still want or have to charge
users, limit the cost in PSI-spirit to
your marginal data delivery
expense (extra bandwidth).
8. Who pays for open data gov
cost?
1. Government subsidizes underlying open data department
costs as a primary service. Government covers the open
data related cost as part of tis general expenses.
2. Government agencies charge each other for cost of data
usage between federal, regional and city level departments
3. 11 open data revenue models for government agencies as
authentic sources
3 options at input side
8 options at output side
9. Charging the INPUT side
Government makes the user pay for
(legally required!) data mutations:
1. Creation of data sets (company
creation, alarm system
registration, publication of annual
accounts,…)
2. Change of data: (address move,
new stakeholder in company,
name changes, corrections…)
3. Deletion of a dataset (inactive
company, bankruptcy,…)
10. Downsides of INPUT based revenue
model
Introduces financial hurdles
Removes incentives to keep data up to
date
Results in lower data quality
Requires higher ‘enforcement’ cost
Requires cost to clean up outdated data
sets
11. Charging the OUTPUT side
1. User pays for individual consultation
2. Basic data are free, but user has to pay to consult extended data or meta
data
3. User pays for use of structured data sets (csv, xml, batch, API,..)
4. User pays for real-time data sets, which reflect current state in authentic
data source (daily update versus monthly update)
5. User pays for removed data (from archive) or for change log (historic
overview)
6. Users pays to Service Level Agreement (eg guaranteed bandwidth or
outside business hours)
7. User pays for monitoring keywords (or events) in (or about) certain data
sets to receive alerts (push notifications, e-mails, SMS,…)
8. User pays for custom bench marking, segmentations, ratios or advanced
filtering options
12. Downsides of OUTPUT based revenue
model
Financial hurdle for ‘newcomers’
Reduces innovation and consolidates ‘status-quo’
Inequality (more for those who can pay, higher
service through faster access, better informed)
Results in limited usage and applications
Requires costs for billing & payment system
with back office operations
13. Gazette / Belgisch Staatsblad /
Moniteur
Input based:
1. Creation of data sets
(company creation,
publication of annual
accounts,…)
2. Change of data:
(address move, name
changes, capital
changes, new
stakeholders…)
14. Belgian example 2:
National Bank Balance sheets
Input
Pay for publication of annual accounts (274 EUR
for BVBA/SPRL = limited liability company)
Output
User pays for use of structured data sets via a
webservice (roughly between 1.850 EUR and 15.000 EUR per
year).
User pays for old archived data sets which are
no longer shown on the National Bank’s website
User pays for custom industry bench marking
and ratios of competitors, customers or
prospects (but one self-owned company
benchmarking remains free)
15. Belgian example 3:
Crossroads bank for enterprises
Input
Creation of data sets
Change of data, such as address move or registering
extra business entity,…
Output
User pays for use of structured data sets (copy of public
part of database with names of company stakeholders
and self employed persons at 75.000 EUR/year
User pays for real-time data sets, which reflect current
state in authentic data source (daily update versus
monthly update) via API (2.000 API request for 50 EUR in
prepaid balance)
User pays for removed data for change log (historic
overview)
Users pays to Service Level Agreement (eg guaranteed
bandwidth or outside business hours)
16. Avoid conflict of interest for gov
agencies
Battle for budget: creates
competition between government
agencies
Inequality in support services and
quality between paying and non-paying
customers or agencies
Battle to secure authentic source as
single gatekeeper and extend reach
Creates competition with private
sector. Due to government agencies
acting as commercial data brokers
selling whole sale personal contact
details to intermediates
17. Excuse 2:
Our data quality is too low to release
Open Data is not your real challenge, you have much bigger data
quality issues…
Accuracy: is the data correctly representing the real-world entity or event?
Completeness: Does the data include all data items representing the entity
or event?
Conformance: Is the data following accepted standards?
Consistency: Is the data not containing contradictions?
Credibility: Is the data based on trustworthy sources?
Processability: Is the data machine-readable?
Relevance : Does the data include an appropriate amount of data?
Timeliness: Is the data representing the actual situation and is it published
soon enough?
18. the process and partner chain is
not…
Document data process
partners
Describe steps in
information chain upward
of your authentic source
(data.be had to reverse
engineer processes)
19. some privacy sensitive data
elements…
Keep the lawyers out of your open data project if you want to make a fast
start
It’s complicated
It’s Personal
Privacy concept evolves over time and is culturally defined
Many grey zones
Don’t forget to try to anonymise your unstructured data too… accidents will
happen
We can technologically do much more than we are permitted to culturally,
morally or legally…
Beware that very few data points are needed to identify a person in this big
data era. Eloquently phrased by Jonathan Mayer: “The idea of personally
identifiable information not being identifiable is completely laughable in
computer-science circles”.
20. Excuse 5: On second thought, we’re
not that open…
Availability: Can the data be
accessed now and over time?
Be consistent and offer long
term commitments and stable
data set formats (integration
mapping)
Data.be received a ‘Cease &
Desist’ after a government
hackathon: “Our government
website is the only authentic
source for air quality
measurement. Stop using our
data immediately or …”
21. Excuse 6: We opened the data in a
layer on our WMS…
Web Map Service (WMS) is a standard
protocol for serving geo-referenced map
images over the Internet that are generated by
a map server using data from a GIS database.
It is very hard to share the layer data…in other
applications
22. Next frontiers for Open Data
Linked & graph data
Metadata
Unstructured data
Structured feedback loops
24. Gatekeepers to the rescue
Don’t just ‘input’ the data which are presented
Inform general public on long term use of their
‘public’ data.
Once online, always online…
Evangelise the use of open data inside and
outside your organisation
25. Open up your organisation
Invite a data scientist to work.
Share insights internally, learn,
optimize quality of data sets
Be open about quality and refresh
rates
Specify the license under which the
data may be re-used.
Provide a feedback loop (now
data.be often is feedback for
outdated company data…)
Maintenance of metadata and data
is critical!
26. Toon Vanagt CEO toon@data.be
@Toon
THANK YOU
3rd Dec 2014 #OUP14
Opening up conference in Brussels
27.
28. Picture copyright & attribution
The brick laying machine pictures
can be found at Tiger
Stone:http://www.tiger-stone.
nl/index.php?option=com_co
ntent&view=article&id=47&Itemid=5
5
Keep calm cup:
http://www.keepcalm-o-matic.
co.uk/product/mug/keep-calm-
and-open-up-67/
Storify with pictures of opening-up.
eu event:
https://storify.com/openingup_eu/op
ening-up-final-conference-1
Editor's Notes
“Data is the new oil" Turns out it meant: Cost for storage & compute cycles will go down faster than you can imagine!
This feels like preaching to the Open Data Choir / So I’m keeping this short
Austerity. More efficient gov. Do more with less money
Good intentions on gov site…
Basic infrastructure & service: like roads & parks were in the past
Bandwidth & storage are cheap nowadays
1. = Ideal
2. & 3. are pragmatic
Hurdle and limit
Xbrl + pdf
Free monthly dataset
Loopholes and unofficial or undocumented access & backdoors…