The road to open data enlightenment is paved with nice excuses by Toon Vanagt

THE ROAD TO OPEN DATA
ENLIGHTENMENT
IS PAVED WITH NICE EXCUSES
3rd Dec 2014 Toon Vanagt CEO data.be @Toon

Official Belgian company info sources

Some data.be features
 Autocomplete
 Mashing up gov
sources
 Data
enrichment
 Financial ratios
 OCR in PDFs
 Entity
recognition
 Alerts

Police Force are top users of
data.be
On the
internet you
must always
remember:
If something
of value is
free, you’re
the product!

Definitions
‘Open knowledge’ is any
content, information or
data that people are
free to use, re-use and
redistribute — without
any legal, technological
or social restriction.
(okfn.org)
 ‘Open data’ and ‘open
content’
mean anyone can freely
access, use, modify, and
share for any purpose —
subject, at most, to
requirements that preserve
provenance and openness.
(opendefinition.org)

Open Data Enlightenment vs
Buzz
The Age of Enlightenment is the era from the1650s to the
1780s in which cultural and intellectual forces emphasized
reason, analysis and individualism rather than traditional lines
of authority….
The current open data philosophy redefines ‘authority’ too and
appeals to analytical power of citizens, hackers, journalists and
entrepreneurs to put data to good use.
Open data:
 fosters “bottom up”-approach
 stimulates to get more out of the data sets
 delivers unexpected results & insights
Beware of fancy alchemy headlines:
 Open Data Is The New Oil
 Unlocking The Gold Mine
 Turning Government Data Into Gold
 €40 Billion boost to the EU's economy each
year…

Excuse 1: But how will we make
money?
 Does your government
(department) really have to make
money with open data?
 Open data quickly evolved into
primary state infrastructure &
service.
 Open data benefits society as a
whole, so why tax usage
separately?
 If you still want or have to charge
users, limit the cost in PSI-spirit to
your marginal data delivery
expense (extra bandwidth).

Who pays for open data gov
cost?
1. Government subsidizes underlying open data department
costs as a primary service. Government covers the open
data related cost as part of tis general expenses.
2. Government agencies charge each other for cost of data
usage between federal, regional and city level departments
3. 11 open data revenue models for government agencies as
authentic sources
 3 options at input side
 8 options at output side

Charging the INPUT side
Government makes the user pay for
(legally required!) data mutations:
1. Creation of data sets (company
creation, alarm system
registration, publication of annual
accounts,…)
2. Change of data: (address move,
new stakeholder in company,
name changes, corrections…)
3. Deletion of a dataset (inactive
company, bankruptcy,…)

Downsides of INPUT based revenue
model
Introduces financial hurdles
Removes incentives to keep data up to
date
Results in lower data quality
Requires higher ‘enforcement’ cost
Requires cost to clean up outdated data
sets

Charging the OUTPUT side
1. User pays for individual consultation
2. Basic data are free, but user has to pay to consult extended data or meta
data
3. User pays for use of structured data sets (csv, xml, batch, API,..)
4. User pays for real-time data sets, which reflect current state in authentic
data source (daily update versus monthly update)
5. User pays for removed data (from archive) or for change log (historic
overview)
6. Users pays to Service Level Agreement (eg guaranteed bandwidth or
outside business hours)
7. User pays for monitoring keywords (or events) in (or about) certain data
sets to receive alerts (push notifications, e-mails, SMS,…)
8. User pays for custom bench marking, segmentations, ratios or advanced
filtering options

Downsides of OUTPUT based revenue
model
 Financial hurdle for ‘newcomers’
 Reduces innovation and consolidates ‘status-quo’
 Inequality (more for those who can pay, higher
service through faster access, better informed)
 Results in limited usage and applications
 Requires costs for billing & payment system
with back office operations

Gazette / Belgisch Staatsblad /
Moniteur
Input based:
1. Creation of data sets
(company creation,
publication of annual
accounts,…)
2. Change of data:
(address move, name
changes, capital
changes, new
stakeholders…)

Belgian example 2:
National Bank Balance sheets
Input
 Pay for publication of annual accounts (274 EUR
for BVBA/SPRL = limited liability company)
Output
 User pays for use of structured data sets via a
webservice (roughly between 1.850 EUR and 15.000 EUR per
year).
 User pays for old archived data sets which are
no longer shown on the National Bank’s website
 User pays for custom industry bench marking
and ratios of competitors, customers or
prospects (but one self-owned company
benchmarking remains free)

Belgian example 3:
Crossroads bank for enterprises
Input
 Creation of data sets
 Change of data, such as address move or registering
extra business entity,…
Output
 User pays for use of structured data sets (copy of public
part of database with names of company stakeholders
and self employed persons at 75.000 EUR/year
 User pays for real-time data sets, which reflect current
state in authentic data source (daily update versus
monthly update) via API (2.000 API request for 50 EUR in
prepaid balance)
 User pays for removed data for change log (historic
overview)
 Users pays to Service Level Agreement (eg guaranteed
bandwidth or outside business hours)

Avoid conflict of interest for gov
agencies
 Battle for budget: creates
competition between government
agencies
 Inequality in support services and
quality between paying and non-paying
customers or agencies
 Battle to secure authentic source as
single gatekeeper and extend reach
 Creates competition with private
sector. Due to government agencies
acting as commercial data brokers
selling whole sale personal contact
details to intermediates

Excuse 2:
Our data quality is too low to release
 Open Data is not your real challenge, you have much bigger data
quality issues…
 Accuracy: is the data correctly representing the real-world entity or event?
 Completeness: Does the data include all data items representing the entity
or event?
 Conformance: Is the data following accepted standards?
 Consistency: Is the data not containing contradictions?
 Credibility: Is the data based on trustworthy sources?
 Processability: Is the data machine-readable?
 Relevance : Does the data include an appropriate amount of data?
 Timeliness: Is the data representing the actual situation and is it published
soon enough?

the process and partner chain is
not…
 Document data process
partners
 Describe steps in
information chain upward
of your authentic source
(data.be had to reverse
engineer processes)

some privacy sensitive data
elements…
 Keep the lawyers out of your open data project if you want to make a fast
start 
 It’s complicated
 It’s Personal
 Privacy concept evolves over time and is culturally defined
 Many grey zones
 Don’t forget to try to anonymise your unstructured data too… accidents will
happen
 We can technologically do much more than we are permitted to culturally,
morally or legally…
 Beware that very few data points are needed to identify a person in this big
data era. Eloquently phrased by Jonathan Mayer: “The idea of personally
identifiable information not being identifiable is completely laughable in
computer-science circles”.

Excuse 5: On second thought, we’re
not that open…
Availability: Can the data be
accessed now and over time?
Be consistent and offer long
term commitments and stable
data set formats (integration
mapping)
Data.be received a ‘Cease &
Desist’ after a government
hackathon: “Our government
website is the only authentic
source for air quality
measurement. Stop using our
data immediately or …”

Excuse 6: We opened the data in a
layer on our WMS…
 Web Map Service (WMS) is a standard
protocol for serving geo-referenced map
images over the Internet that are generated by
a map server using data from a GIS database.
 It is very hard to share the layer data…in other
applications

Next frontiers for Open Data
 Linked & graph data
 Metadata
 Unstructured data
 Structured feedback loops

Gatekeepers to the rescue
 Don’t just ‘input’ the data which are presented
 Inform general public on long term use of their
‘public’ data.
 Once online, always online…
 Evangelise the use of open data inside and
outside your organisation

Open up your organisation
 Invite a data scientist to work.
Share insights internally, learn,
optimize quality of data sets
 Be open about quality and refresh
rates
 Specify the license under which the
data may be re-used.
 Provide a feedback loop (now
data.be often is feedback for
outdated company data…)
 Maintenance of metadata and data
is critical!

Toon Vanagt CEO toon@data.be
@Toon
THANK YOU
3rd Dec 2014 #OUP14
Opening up conference in Brussels

Picture copyright & attribution
 The brick laying machine pictures
can be found at Tiger
Stone:http://www.tiger-stone.
nl/index.php?option=com_co
ntent&view=article&id=47&Itemid=5
5
 Keep calm cup:
http://www.keepcalm-o-matic.
co.uk/product/mug/keep-calm-
and-open-up-67/
 Storify with pictures of opening-up.
eu event:
https://storify.com/openingup_eu/op
ening-up-final-conference-1

The road to open data enlightenment is paved with nice excuses by Toon Vanagt

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to The road to open data enlightenment is paved with nice excuses by Toon Vanagt

Similar to The road to open data enlightenment is paved with nice excuses by Toon Vanagt (20)

More from Opening-up.eu

More from Opening-up.eu (20)

Recently uploaded

Recently uploaded (20)

The road to open data enlightenment is paved with nice excuses by Toon Vanagt

Editor's Notes