Some thoughts about hack days, developers, and data... Performed by Neontribe at the Culture Hack East data day: more on that at http://culturehackday.org.uk NB: Read the notes view. Doesn't make a lot of sense otherwise...
We spend a lot of time working with open data – this time about public toilets in London. A neat example of the open data niche small companies seem to find more comfortable than larger organisations. This particular project had us and the redoubtable @symroe taking various files, in various formats, and turning them an API which we then wrote a web app. to display. http://greatbritishpublictoiletmap.rca.ac.uk/
And we made this, which is a neat little page that asks you for a DfID project number, and spits it out into a widget that references all the publications related to that project. http://r4d.herokuapp.com/
We've been to a fair few Rewired State hack days, and we've mentored for Young Rewired State since its inception. http://rewiredstate.org/
And I'm here today to talk about how to make your data developer ready – in a pretty wide sense. Not just about formatting your data helpfully, but about how to make it a bit more useful and a bit more interesting to developers. I'm not a developer myself, but I work with them and I've organised back events, and made a hack at a hack day myself...
Warning!
This presso makes extensive use of metaphor. Please treat with caution!
Let's imagine this is the developer community. Buzz buzz buzz, work work work.
And you have data. (Event listings. Lists of locations of things. Service outcomes. Information about artefacts.)
And here's where we'd like to end up at the end of a hack day. Tools, proofs-of-concept, mashups, a visualisation or two...
I told you there'd be metaphor...
It's all about effective collaboration. You and the bee. The pollen is your data. You are the flower: scent, colour, patterns, nectar and all...
Geeks are scared too, on hack days. They have to perform in a day. They could do with help with some preparation... Writing about your data will make them aware of it before the hackday, and that might help them choose your data and make something better with it. Don't have set ideas, but try and give a scent of inspiration... Use the event hashtag to get the word out. This is a long—range kind of communication, before the actual event.
Colour! Some idea of what's available, where to start. Have an open data page somewhere. “Here are our problems...” Your mission statement and objectives? Enthuse! Look what's over here! Patterns! Landing strips... Pointers to the data. And in it. You know what the jargon means, you know your TLAs. Help get that information out. “This is how to understand what's here.”Use uniform identifiers: Postcodes. Charity numbers. ISO 8601 for dates. (Excel speaks ISO 8601...) That's a Yellow Bee Orchid - Ophrys lutea – by the way.
This is a bee-friendly flower! It's very obvious where the data is, how to use it. Developer friendly data is easy to work with. Data that's easy to hack with Dates and places. Mapping is easy, but a bit old these days. Data that changes over time is the new(er) thing. (Take a look at Hans Rosling's TED talks – this for example - http://www.ted.com/talks/hans_rosling_at_state.html ) Locational data that changes over time... Yum. Nectar. (Yes, I'm pretty sure this flower has no scent. I did say the metaphor should be approached with caution...)
Cool hacks use your data along with someone elses. So make your data play nice with other orgs... It's great to formally link your data to other peoples, but it is some extra work, and it can be technical. It's good to have an API, particularly one that's standards-based, but hack day people are used to working without them. It's interesting data they can work with they'll be looking for. Data that make good stories. Look at Jeni Tennyson's work with crime data (NB: I got this wrong on the day – I remembered Jeni talking about“birth rate” data, when in fact she pointed up trends in reported cases of bigamy... My bad.) or Anna Powell-Smith's recent work with baby names at http://darkgreener.com/baby-name-data. Interesting stuff.
I tried to make a slide about the waggle-dance, but my metaphor was going too far as it was. Maybe something about devs. telling each other about good data sets. They do. There's something about giving clear signals on the day, maybe wearing a name badge with your data information on it, being there to answer questions about it, something like that.
Importantly: make it open of you can. See, this data is closed up. Won't attract any but the most determined bee.
That's pretty open...
But look, this is it. Flowers must be open... Licensed, yes.
This is a Creative Commons license. Might be appropriate for some of your data.
See, an excited bee. (NB: I have no basis for knowing this is an excited bee beyond it was the first image on a Flickr search for “excited bee” that was available under the right license, and in the correct aspect ratio.)
This would be too excited.
It's all about effective collaboration. You and the bee. You are the flower, and the pollen is your data.
Thanks.
This is us. Neontribe. Web developers.
Now... Part 2. The same kind of thing, without the metaphor, and with a howto at the end.
“ It is the unexpected re-use of information with is the value added by the web.” T B-L.
Developer read == fit for unexpected re-use.
Tim Berners-Lee's “5-star” rating for open, linked data is a pretty darned good short-hand for developer-ready data. “1-star: Available on the web (whatever format) but with an open licence to be Open Data”
2-stars: “Available as machine-readable structured data (e.g. excel instead of image scan of a table)”
3-stars: “as (2) plus non-proprietary format (e.g. CSV instead of excel)”
4-stars: “All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff.” I'm not going to talk any more about that today, partly because I'm not technical enough to answer all your possible questions, and partly because while it'd be fabulous if you all had your data in 4-star format, I'm not so sure it's possible between now and the hack day.
“ 5-stars: All the above, plus: Link your data to other people’s data to provide context” Again, I'm not going in to this today – there's a lot to learn and think about here. The Museums Computer Group is one place to start – a friendly mailing list for the GLA community to talk about data amongst other issues at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=mcg
Oh.
One more thing.
Don't worry
Your data formatting doesn't have to be perfect. In the early days of Rewired State, developers scraped data out of government websites by brute force. Giving a dev a .csv will be a lot easier than that...
Save as... CSV. That's my hope for today, that you'll know to save as .csv because that''ll really help devs. Work with your data. It'd be great if it were on the web and openly licensed, but even that... Turn up with a memory stick and an interesting dataset, and you might grab someone's interest.
Save as... CSV
So: here's the shortcut to publishing open data – useful to anyone who knows their spreadsheet, and has a Google account. Licensing isn't my specialist subject... I doubt any technical guru is going to think your .csv files are absolutely the best format possible, but they will make it a bit easier for developers to work with your data.
Here's the website of the marvellous Norfolk and Norwich Festival. I'm very keen on their work: they're from Norwich and they have an international reputation. I used their data at Rewired State Norfolk last year, and for my purposes today, it's a useful example of event information...
Here's their brochure in pdf format, available on the Internet. Good.
I want to use it as an example, so I've just pasted the first column into a spreadsheet. (I use LibreOffice, it could just as easily have been Excel.) You can see it looks a little irregular at the moment. As you'd expect.
Just taking that first row, I split it into four columns.
Starting to look a bit more regular now, after a bit of cutting and pasting, and my spreadsheet has guessed that some of those num,bers are times and given them a standard format. Interesting, there's an event that's not worked for – The Voice Project starts at “Sunrise”.... Easy for a human being to get that, harder for a machine...
So I nip over to another website to find the time... I generously forgave myself for using London sunrise time rather than Norwich's.
So, here it is a bit more regular. I've made the dates into a standard format, added that sunrise time. More of a consistent pattern.
Added a header row, so explain to the machine what these columns meant. (NB: I made a small mistake here too – it'd be better to make “Start Time” into one word”StartTime” or “Start_Time” Just makes it a little easier for the machines....)
Here we go - “Save as... CSV” With a nice file name, no spaces.
This is a little technicality – the comma symbol is what separates the items in a row, and the quotation marks denote text... Just click OK. Now, upload that file onto a blog post under an open license, and you've got 3-star open data. Then talk about it, say what you love about it, inspire developers to get in and make stuff with it. It#'s not a perfect format, but it removes a lot of the barriers between a dev and using your data.
I wanted to go a little further, and I know a few hack day devs who rather like a data format called JSON. (Javascript Object Notation...) So I turned to my Google account.
See, we use Google Docs a lot. (Yes, I know there are some licensing issues... Not my area of expertise.) Particularly interesting – that UK GDP data set from the Guardian. I hit the red “create” button.
Got myself a new spreadhseet, copied that bit of Festival data in.
I could choose some sharing settings – as you can see, I've not published the sheet but you'd need to to make it open. (Sharing settings is linked from the File menu)
Also from the File menu, I could choose to publish to the web. There's a few options – one of them is “CSV” as it happens, so you;d not even need to upload your csv file, you can link to it's Google Docs incarnation. I chose ATOM because I want to use a trick... Grab the web address from there...
Do what it says on this slide, and you get JSON.
Which looks a bit like this. Not as human-readable, but a bit more machine-readable. And that's it. Thanks for listening.