On 2013-11-13 I gave a presentation on the use of the energy labels dataset of the Dutch Ministry of Economic Affairs. I first turned their XML dataset into 5-star LOD (by linking it to the BAG) and then created a Web application that runs on top of it.
2. Context
•
•
•
•
Energy label / Energy Performance of Buildings Directive (EPBD)
Possible values: A - G
Measurements are valid for 10 years.
Requirement when buying or renting a house.
3. EnergyLabels dataset in numbers
• 2,354,560 entries
• Energy index
• Electricity consumption
• Gas consumption
• License: Creative Commons 0
• Dissemination date: 2012-11-05
• Updated on a daily basis
• Issued by: Energielabels Agentschap NL
• Related dataset?: Liander Open Data, approx. 1,250,000 entries.
4. Linked Open Data
•
•
•
•
•
Connect to existing datasets.
Connect to services.
Run queries across datasets.
Perform inference across datasets.
Easy to create mash-ups / new applications.
cheap to do all of this,
only then will Linked Data be an enabler for large-scale innovation.
If it is
(disclaimer: this is a subjective claim)
5. RDF files
Domain-independent data conversions
fully automated
Relational DB
Domain-dependent data conversions
domain knowledge needed
domain knowledge
Simple RDF
Link to external sources (linksets)
domain knowledge needed
XML files
depends on structure
domain knowledge
Fixing bad data
origin inconsistencies
& inaccuracies
Text files
ambiguous
Connect to services
(e.g. query interface, maps)
high level of reuse
6. Technological contribution
• From 3-star (published, open format) to 5-star (Linked Data, URI
identifiers, linked to BAG).
• Stored in 2.6 GB XML document containing one (1!) line :-)
• DOM is too big to hold in RAM.
• Convert to multi-line XML document.
• XML2RDF conversion infrastructure:
• Create a resource using primary/rigid properties.
• Create triples for a resource
10. Using Linked Data (Wouter’s Inbox)
Dear Wouter,
we gave the students of our Semantic Web class the link to the
Kadaster information, and made them enthusiastic to use it. As a result
several now have build their apps around this data. But now it has been
offline for several days.
Cheers,
Stefan.
11. Main difficulties (1/3)
Technical difficulties due to arbitrary data formatting.
• Publishing data in a sane way decreases the conversion costs
considerably.
• In this use case: half of all the effort went into the 1 line XML...
12. Main difficulties (2/3)
Institutional difficulties:
• Data publication is a short-duration visible event.
• Data maintenance is a long-duration invisible event.
13.
14. “You can fool all the people some of the time, and some of the people
all the time, but you cannot fool all the people all the time.”
Abraham Lincoln
Let's make some substitutions here...
“All LOD datasets are offline some of the time, and some of the LOD
datasets are offline all of the time, but not all LOD datasets are offline
all of the time.”
Wouter Beek
15. Main difficulties (3/3)
Infrastructural difficulties:
• Assuming that some LOD data is online some of the time, we must
explicitly represent the network of interconnected LOD
datasets, institutions, and maintainers (DC, FOAF, VoID).
• Anticipating malfunctioning datasets should be a standard part of the
development API.
16. Conclusion
Only when the technical, institutional, and infrastructural
problems are solved will Linked Data become an enabler for large-scale
innovation.