This document discusses the transition of a government-funded aggregation of childcare and activity data to an open data model with lower costs. It outlines efforts to migrate the data infrastructure to lower-cost hosting, and challenges integrating open spatial data like CodePoint Open and OS Locator due to missing address name mappings. The vision is presented of a stand-alone gazetteer app for local geocoding and reconciling multiple data sources, with Apache SOLR and ElasticSearch identified as potential tools. Progress on an open-source gazetteer framework is noted.
1. Open Spatial Data
Progress towards a reusable gazetteer
th
Open Data Group – 16 April 2012
@ianibbo
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
2.
3. Overview
Original Problem
How to transition a central gov't funded aggregation of
childcare and positive activities with a budget of
>2m / year to an open data* model running on £60/
month hardware
Retaining security (Of a certain level)
Retaining functionality
(See http://www.madwdata.org.uk/blog/id/394)
4. 2 Major Costs To Mitigate
Large cluster of proprietary OS hosts, ~12 front
end web servers, hot backup sql server
Migrated to 1*Pound Host server ~£60/month, server
has 2 hard drives, hot backup, off site rsync
Data costs – BPH Address-Point data – Used for
geocoding incoming records and lookups on
search terms. OS Boundary Line
???
6. Open Spatial Data
Ordnance Survey Open Data
http://www.ordnancesurvey.co.uk/oswebsite/products/os-lo
Code Point Open
Postcodes to Northing/Easting
OS Locator
Gazetteer of road names (And other features)
Obtained by registering on website, requesting,
getting email, following link, …..
7. The reality of CodePoint Open
The core data is “Open”
Missing the one vital link between CodePoint
Open and OS Locator – PostCode → Road
Names / Identifiers.
If you're happy to display Postcodes without road
names, it's ideal.
Last Mile Problem.
Finding an automated way to link the 2 is hard!
Licensed data is now open, but out of date
9. Problem with focus on “Open Data”
Everyone ends up implementing their own
gazetteer
Large scale providers have rate limits and
introduce external dependencies / Speed
issues
People want local geo-coding (for lots of different
reasons).
Having rolled your own gazetteer, you discover
you need to handle updates (Full replacements)
It's not an end in itself
10. Vision
A stand-alone gazetteer web app designed for
local network use with features for importing
updates from OS, reconciling multiple data
sources and performing geo-coding lookups.
11. Available Tools
Apache SOLR
Long-Standing stalwart of the open data and search
community
Schemas slightly clunky
Several spatial options, all with different strengths /
weaknesses. Multiple points a problem in some.
ElasticSearch
Schema Free, Apparently Solid Spatial, Multi Points
Good integration with Mongo via Rivers
12. Problems / Issues
ES Spatial search hard to do directly via a COOL
URL
Spatial query syntax is expressive, but complex and
needs JSON sub-documents
Need service wrappers
But thats easily done
Updates!
13. Missed Level of Abstraction
(Common to many open data sets?)
Local
Copy
C
o
Sourc m Processin
e pa g
re
NOSQL Like ES Ideal for
Mongo is ideal for this
this
14. Progress
Starting to extract code from existing services
into a generic spatial app
https://github.com/ianibo/AnOpenGazetteerFramewo
Work progressing under aegis of GIST Mobile
group / Open Data group
Workable Gaz now, but command line interface
for importing.
16. Some supporting info
Original Project – FOI request to DfE
Total costs - First 3 years
7000000
Local Authority Consultation sem-
6000000 Revenue inars
Local Authority Capi- Methods Consulting
5000000 tal
Central Office of In- Engine Group
4000000 formation
Qi Consulting Digital Public
3000000 Redhouse Tribal Education
DfE Staff Costs
2000000
1000000
0
2008-09 2009-10 2010-11
17. First 3 years - Non LA costs
2500000
Central Office of In-
2000000 formation
Qi Consulting
Redhouse
1500000
DfE Staff Costs
Consultation sem-
inars
1000000
Methods Consulting
Engine Group
500000 Digital Public
Tribal Education
0
2008-09 2009-10 2010-11