Quality checklist for registers applied to online price information and offline route information.
1. Quality checklist for registers applied to
online price information and
offline route information
Saskia J.L. Ossen, Piet J.H. Daas,
and Marco Puts
Statistics Netherlands
May 5, 2010, Helsinki, Finland
2. Overview
Introduction
Quality framework for registers
Checklist for registers
Application of checklist to other data sources
• Offline routing information
• Online (internet) price information
Results
Conclusions
Future work
3. Introduction
Statistics Netherlands wants to increase the use of
data (sources) collected and maintained by others
• Not only registers and administrative data sources
• But also other data sources
– internet
– route information
– ….
As a result, Statistics Netherlands becomes:
• More dependent on data sources from others
• Must be able to monitor the quality of those data sources
– How?
– By applying the earlier developed checklist for registers?
4. Quality framework for registers
Statistics Netherlands has developed a framework
for the determination of the quality of registers
Composed of:
• 3 high level views on quality (Hyperdimensions)
• Each view focuses on a different group of quality
aspects
7. 3 Different high level views on quality
METADATA:
Focuses on the
SOURCE: - Focus on data source as a whole(availability of the)
- Mainly delivery related aspectsinformation required to
- and some other things understand and use the
data in the data source
SO
UR A
CE T
A DATA:
D - Technical checks
- Accuracy related
issues
8. Framework composition
Source
HYPERDIMENSION Metadata
Data
n>1
5 for Source
DIMENSION
4 for Metadata
n >= 1
QUALITY INDICATOR
1:n
Measurement method
9. Determine Source and Metadata quality
With a checklist
• Used for both Source and
Metadata
Extensively tested on registers
What about other data sources?
10. Apply checklist to other sources
(1) Offline route information
• For Transport statistics
– Check number of km driven
– Border crossing(s)
Price information on the internet (www)
• (2) Flight ticket prices (manual and automatic)
• (3) Supermarket product prices
• (4) House prices
• (5) Product prices of unmanned filling stations
11. Approach used for testing checklist
Applied the checklist to 5 data sources
1. Looked at the scores obtained
• Identify quality issues
2. Ease of use of checklist
• Applicability of questions
3. Missing quality aspects
• Are any indicators missing?
12. Checklist scores (1) - Source
Table 1 Evaluation results for the Source hyperdimension
Offline route Internet Prices
information
Supermarket Prices of Prices of Prices of flight
prices houses filling stations tickets
Supplier + ? ? ? ?
Relevance + + ? ? +
Privacy and security + + + + +
Delivery + + + + +
Procedures +/ o o/+ o/+ o o
+, good; o, reasonable; -, poor; ?, unclear
13. Source conclusions
Route information resembles registers a lot, no
quality issues identified
Internet data, more difficult
• Who supplies price information on website?
• Legal issues of collecting data via websites
• Website change, often unexpected
• No real deliveries when collecting internet data
14. Checklist scores (2) - Metadata
Table 1 Evaluation results for the Metadata hyperdimension
Offline route Internet Prices
information
Supermarket Prices of Prices of Prices of flight
prices houses filling stations tickets
Clarity + +/o +/o +/ o +/ o
Comparability + + ? ? +
Unique keys + + + + +
Data treatment o + + + +
+, good; o, reasonable; -, poor; ?, unclear
15. Metadata conclusions
No major issues for the Metadata part of checklist
Routing information, no problems
Internet data, somewhat more difficult
• Clarity of internet population
• Clarity of time periods to which prices refer
16. Checklist applicability
Table 5 Applicability of the quality checklist for the Source hyperdimension
Offline route information Internet prices
Supplier + -
Relevance + +
Privacy and security o o
Delivery + -
Procedures + o
Table 6 Applicability of the quality checklist for the Metadata hyperdimension
Offline route information Internet prices
Clarity + +
Comparability + +
Unique keys + +
Data treatment + o
relevant (+), partly relevant (o), generally not directly applicable (-)
17. Missing quality aspects
Only for internet data
• Availability of the website
• Burden on website
• Errors in data on website
• Representativity of website information
• Possibility for automatically collecting data
18. Overall conclusions
Source hyperdimension
• Directly applicable to route information
• Inherent differences for internet prices
Metadata hyperdimension
• Generally applicable
Future research will focus on:
• Adapting checklist to internet data
• Legal issues for internet data
• Data quality