Open North: Founded in 2011 Based in Montréal but with a Canadian scope More in the execution than in advocacy ---- Crunch open data to make it compelling for citizen Link open data with open gov, provide a feedback loop Fill the gap between governments and citizens
Presentation of the framework: no rocket science, just some (irrelevant) analogies Many are saying data is the natural resources. But it’s not natural, nor really a resource… Anyway, let’s take this analogy for a moment, what kind of resource do we have on earth: Some stuff look obviously interesting: diamond. Beautiful and incredibly hard Some look… less interesting. Like Bauxite ore… which gives us aluminium
How do we get value from these two different resource: Diamond. Quite easy (in theory): find it, cut it and you have something extremely valuable. Humans assigned high value to diamond 4000 years ago and it’s still valuable Bauxite/Aluminum At first, you have to discover there is something interesting there. Only happened in 1800 for Bauxite. Then need a hell of steps to extract the valuable part of the ore: process high quantities of ore to get a decent amount of aluminum And when you have pure aluminum, you are not done: either you product 100% aluminum products (and you need to find processes to produce that), but most of the time aluminum will only serve as parts of a larger product, like car. In this case, Aluminum has value only because if the existence of other produces that need its. Even we had found how to extract and use aluminum 1000 years before, it would have largely remained unused.
Let’s come back to our topic: open data! And more precisely, transportation data and even more precisely traffic and transit data. Recently, I’ve been working with some data from the SF bay area. Number of apps cannot be considered as the panacea to determine the success and value of data set. BUT have such a discrepancy between traffic and transit shows something. Let’s try to look at the value of data by itself. Let’s forget about licenses, community management and reach out. They are important but as the SF case exemplify, transit data are very successful while traffic data is not. Why is that? Is public transportation so much cooler than cars that it drags all the market?? Or is this that traffic data seem less valuable? We can’t be sure, but let’s have a look to some criteria that differentiate both types of data.
Open data oriented standards = specs available to anybody, for free, with no problematic license and there is no barrier/mechanism that makes the access difficult or expansive (e.g access to a hub, need for incredibly huge infrastructure or use of proprietary technologies). For traffic, TMDD and DATEX are probably the closest to open data standards (except that TMDD is not free), but too complex for open data, mainly designed for center to center communication and not toward travelers. Why standardizations matters? Clear doc to build tools When an app is based on a standard, each new place using this standard is a new market => More chance that a development investment will be amortized. Building application of a custom format, mainly with low hopes to have other places developing similar datasets is not the best solution to develop a product…
But BART was able to have a large number of app based on their real time API that is not standard. So why? Even if non standard, the data is easy to use. More precisely, once you grabbed the data, you can provide very interesting information without anything else. Transit data are by definition self-sufficient. Each transit network can be isolated easily, each network is well known Buses use roads but you don’t really need that. Schedule, fares, positions, you have more or less everything you need. Traffic data is not self sufficient at all.
Complexity and self-contained are close but different Transit is self sufficient and simple to model. GTFS is a handful of fixed tables and field that gives a good representation of real life. Modeling traffic data in general, or even some specific subsets like events is more complex. In order to keep models simple, it has to deviate from reality more that what is done with transit. The question is: what is the equilibrium between simpler data and deviation from reality. In any case, more complex data is more difficult to integrate in tools
Reliability of the data is partly linked with complexity. When the data model is a little farther from the reality, it becomes less reliable, it cannot take into account all the possible things cases of real like. On top of that, the ability to monitor the subject is key. Real life transit usually fits well the data. Yes, buses might be stuck in the traffic, but at least, it should pass where is it supposed to. And getting realtime data is simple (if not inexpensive): GPS. Monitoring of traffic is much more difficult. You can’t put sensors everywhere, maintaining a sensor network is expensive. You can’t put cameras everywhere. As a global, it is much more difficult to get reliable data on traffic.
Remember the diamond/aluminium analogy? Transit is the diamond of transportation data, and traffic data is its aluminum. Then, traffic data should be valuable? How does it look like? What is this value and how to extract? Let’s jump in a techno-utopian world, few years from here. I ask my smartphone … He proposes me the fastest way. Transit, we know it Bike share: data usually available Walk: need road data By the way, this not so futuristic. Startups and Google Now are more or less doing this. But large chunks data are still missing, data that should be open.
In order to get the car option right, we need a lot of data
This is where the analogy becomes clear: Diamond data’s value can be extracted easily Aluminum data need a large framework The case for open data for aluminum data might seem less obvious: it takes more knowledge, more infrastructure to use this data. The perennial geek who demonstrates the value of dataset during a hackathon will have more difficulties to do his job. Incumbent private companies seem to be able to buy or build the data needed. But in the end, this is the real place where open data is crucial: develop new uses that are not obvious, help serious new comers to overcome market barriers, analyze data to provide useful insights. Processing complex things is more and more “open”. Think of open source plans to build cars or buildings. It is not accessible to anybody, but this is a major change in business practice.
What would be a presentation without a Venn diagram? More or less stolen from David Eaves. Aluminum data frequently needs other data as it is in the car trip example. But as you can see, it cannot be restricted to gov data. There is case for more than gov data. Via existing initiative, there is currently a focus (mainly in the US) about opening personal data: blue button, green button. Give back to people their data to be used/integrated; use data provided by people More and more the case of open data from private companies. “Open” can be litigious because there is not always a real open license on it but things will probably evolve on this. Why private companies would open their data? To let people know about their product (e.g parking), to show market superiority (vehicle efficiency) or to create a platform.
Gartner’s hype cycle: Frequently have inflated expectation which brings disillusionment Not all the govs follow the cycle at the same pace. Diamond data is the one that tend to create inflated expectation. Everything look simple… but it is not. Most of the value of open data lies in “aluminum” data where value extraction is longer => this is the steady growth Internet lived the same cycle: end of the 90’s: Internet would change the world. But in 2000, Internet sounded like massive vaporware. Now, in 2013, Internet is bigger and more important than what most of the people were expecting at the peak of expectations in 90’s (e.g Stairway to heaven). Open data could follow the same path as internet… if all actors continue to push on the development of open data and build on “aluminum” data.