Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
What is opendata
1. DATAVIZ: VISUAL REPRESENTATION OF COMPLEX
PHENOMENA
data visualization & computational design
@ Better Nouveau Workshop
14/12/2011
What is Open Data?
Lorenzo Benussi, TOP-IX Consotium
lorenzo.benussi@top-ix.org
1
2. About me
Research & Business
Development
TOP-IX Consortium
Fellow, NEXA Centre
Polytechnic of Turin
Fellow, Department of
Economics University of Turin
2
5. Ref: National Geographic http://ngm.nationalgeographic.com/big-idea/14/augmented-reality
Background
5
6. BIG DATA stylized facts 1
• $600 to buy a disk drive that can store all the
world's music.
• 5 billion mobile phone in use in 2010.
• 30 billion pieces of content shared on Facebook
every month.
• 40% of projected growth in global data generated
per year VS 5% growth in global IT spending.
• 235 terabytes data collected by US Library of
Congress in April 2011.
• 15 out of 17 sectors in the United States have more
data stored per company than the US Library of
Congress
McKinsey: Big Data:The next frontier of innovation, competition and productivity. (may 2011)
6
7. BIG DATA stylized facts 2
$300 billion potential annual value to US health care - more
than X 2 total annual health care spending in Spain.
• €250 billion potential annual value to Europe's public sector
administration - more than GDP of Greece.
• $600 billion potential annual consumer surplus from using
personal location data globally.
• 60% potential increase in retailers' operating margins
possible with big data.
• 140.000-190.000 more deep analytical talent position and
1.5 million more data-savvy managers needed to take full
advantage of big data in the USA.
McKinsey: Big Data:The next frontier of innovation, competition and productivity. (may 2011)
7
8. WEB(squared)
1.Redefining Collective Intelligence:
New Sensory Input
2.Cooperating Data Subsystems
3.How the Web Learns: Explicit vs.
Implicit Meaning
4.Web Meets World: The
"Information Shadow" and the
Internet of Things
5.The Rise of Real Time: A Collective
Mind
Ref: Tim O’Reilly and John Battelle (2009), Web Squared: Web 2.0 Five Years On.
http://www.web2summit.com/web2009/public/schedule/detail/10194
8
9. Digital technology could enable an extraordinary range of
ordinary people to become part of a creative process.
(The future of ideas, Lawrence Lessig)
9
10. When I say that innovation is being democratized, I mean
that users of products and services—both firms and individual
consumers—are increasingly able to innovate for themselves.
(Democratizing Innovation, Eric Von Hippel)
10
11. The value of metrics
• Data Hal Varian, Google’s Chief Economist
• Information
• Knowledge
• Value
11
13. DATA as a SERVICE
Data are not closed inside applications but they are consumed on-demand as
a service
RESTful API make possible to access data as a web resource (trough URI)
13
14. Business Models
A. Data owner: paid to publish / revenue share.
B. Data user: pay for data delivery/trasformation/
analysis services.
New Generation Marketplace
3. Works with open and not-open data
4. Provide data on-the-fly through API (evan custom).
5. Sometime the community of data curators in
involved to maintain and expand the data crowd-
sourcing (e.g. Factual).
6. Provide tools (web based) to explore the data
14
15. What open data means?
Open Data is a model to extract value from
public sector information by using the data
to build new tools and to create innovative
services
15
16. PSI (public sector information) mines
• The Public Sector produces
and manages huge amount of
data, opening PSI information
in EU produces economic
growth 140 billion € / year
(aggregate)
• Public Data are the raw
material to create new
products and services
COURTESY/RON WHEELER. The 8,000-foot deep Homestake Gold
Mine in South Dakota is the site where scientists, including UC
Berkeley researchers, plan to construct the world's deepest research
center.
16
17. data.gov
“Openness will strengthen our democracy and
promote efficiency and effectiveness in
Government”
Transparency and Open Government
Memorandum for the Heads of Executive
Departments and Agencies (2009)
[…] As you know, transparency is at the
heart of our agenda for Government. We
recognise that transparency and open data
can be a powerful tool to help reform public
services, foster innovation and empower
citizens.
David Cameron - Letter to Cabinet Ministers
(2011)
17
18. Information is the currency of democracy
Benjamin Franklin (attribution)
18
19. Raw data now!
"... give us the unadulterated data, we want the data, we want
unadulterated data. We have to ask for raw data now."
Tim Berners-Lee, advisor data.gov.uk
19
21. Legislation in EU, Italy and
Piedmont
EUROPA
Direttiva 2003/98/CE del 17 novembre 2003
The evolution towards an information and knowledge society influences the life of every citizen in
the Com-munity, inter alia, by enabling them to gain new ways of accessing and acquiring knowledge.
DIRECTIVE 2003/98/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17
November 2003 on the re-use of public sector information
ITALY
Decreto Legislativo n. 36 January, 24 2006 and
L. 96/2010.
PIEDMONT
Delibera di Giunta regionale 36 - 1109 November
2010
21
25. apps4italy
• All EU citizens can participate (!!) & 40K€
in cash prizes
• Building useful, innovative projects based on italian
public data (not only open data)
• Four main categories (growing):
1. Ideas
2. Apps Ref: appsforitaly.org
3. Visualization
4. Datasets
25
27. Open Knowledge Definition v.1.1 by OKF
A work is open if its manner of distribution satisfies the
following conditions:
1. Access
2. Redistribution 8. No discrimination (fields
or endeavor)
3. Reuse
9. Distribution of license
4. Absence of technological
restriction 10. License must not be
specific to a package
5. Attribution
11. License must not
6. Integrity restrict the distribution of
other works
7. No discrimination
(persons or groups)
27
28. Open Definition - http://opendefinition.org/okd/
Version 1.1
Terminology
The term knowledge is taken to include:
# 1.# Content such as music, films, books
# 2.# Data be it scientific, historical, geographic or otherwise
# 3.# Government and other administrative information
Software is excluded [...]
The term work will be used to denote the item or piece of knowledge
which is being transferred.
The term package may also be used to denote a collection of works. [...]
The term license refers to the legal license under which the work is made
available. Where no license has been made this should be interpreted as
referring to the resulting default legal conditions under which the work is
available (for example copyright).
28
29. The Definition - A work is open if its manner of distribution
satisfies the following conditions:
1. ACCESS
The work shall be available as a whole and at no more than a reasonable
reproduction cost, preferably downloading via the Internet without charge. The
work must also be available in a convenient and modifiable form.
2. REDISTRIBUTION
The license shall not restrict any party from selling or giving away the work either
on its own or as part of a package made from works from many different sources.
The license shall not require a royalty or other fee for such sale or distribution.
3. REUSE
The license must allow for modifications and derivative works and must allow
them to be distributed under the terms of the original work.
29
30. 4. ABSENCE OF TECHNOLOGICAL RESTRICTION
The work must be provided in such a form that there are no technological
obstacles to the performance of the above activities. This can be achieved by the
provision of the work in an open data format, i.e. one whose specification is publicly
and freely available and which places no restrictions monetary or otherwise upon
its use.
5. ATTRIBUTION
The license may require as a condition for redistribution and re-use the attribution
of the contributors and creators to the work. If this condition is imposed it must
not be onerous. For example if attribution is required a list of those requiring
attribution should accompany the work.
6. INTEGRITY
The license may require as a condition for the work being distributed in modified
form that the resulting work carry a different name or version number from the
original work.
30
31. 7. NO DISCRIMINATION AGAINST PERSONS OR GROUPS
The license must not discriminate against any person or group of persons.
8. NO DISCRIMINATION AGAINST FIELDS OF ENDEAVOR
The license must not restrict anyone from making use of the work in a specific
field of endeavor. For example, it may not restrict the work from being used in a
business, or from being used for genetic research.
9. DISTRIBUTION OF LICENSE
The rights attached to the work must apply to all to whom it is redistributed
without the need for execution of an additional license by those parties.
10. LICENSE MUST NOT BE SPECIFIC TO A PACKAGE
The rights attached to the work must not depend on the work being part of a
particular package. If the work is extracted from that package and used or
distributed within the terms of the work’s license, all parties to whom the work is
redistributed should have the same rights as those that are granted in conjunction
with the original package.
11. LICENSE MUST NOT RESTRICT THE DISTRIBUTION OF OTHER WORKS
The license must not place restrictions on other works that are distributed along
with the licensed work. For example, the license must not insist that all other
works distributed on the same medium are open.
31
33. A paradigmatic shift:
information economy
• The transition from a physically-based to a knowledge-based
economic environment made information a primary
wealth-creating asset.
• Digital access to information seems to have changed the
structure of many industries, promoting services-oriented
business models based on disclosure and sharing of
information and knowledge.
33
34. A paradigmatic shift:
PSI data mines
• The Public Sector holds and manages huge amounts of
data and information. Fostering access to those repositories
enables new business opportunities that can broaden
market volumes in such sectors.
• PSI represents the raw material from which value added
products and services can be designed.
34
35. The use/value of PSI
PSI can be used and reused in
many ways (non rivalry in
Several supply chain
consumption):
configurations.
1.Broad range of sectors
1.Linear models (private re-users
2.Different sets of actors add value)
3.PSI holders 2.User generated contents
4.Private re-users 3.Information sharing between
5.Regulatory bodies public bodies
6.Citizens
35
36. The price of PSI:
the “free data” approach
• The peculiar cost structure of digital data collecting, processing
and delivering (high fixed costs, zero marginal cost) strongly
influences the possible pricing strategies to be adopted by PSI
holders.
• Pollock (2008): a price that equals marginal costs (i.e. PSI free of
charge) is socially optimal provided that elasticity of demand
and positive externalities overcome a given threshold.
✓ Empirics: those conditions are likely to be verified in most of
the PSI domains.
36
37. The price of PSI:
cost recovery approach
• Although a cost recovery regime may bound potential demand
and distort competition, several critical issues could trigger its
adoption.
• Underestimation of downstream demand and network
externalities.
✓Lack of long-run commitment in subsidizing PSI collection.
✓Short-term decision making.
✓Moral hazard (?).
37
38. The price of PSI: possible scenarios
Directive 2003/98/EC is aimed at fostering PSI reuse mainly by promoting:
1.PSI availability in digital format
2.Transparency of reuse conditions and pricing
3.Non discrimination
Which market configurations are likely to emerge?
MEPSIR (2006)
Directive impact Main condition Example
Information is strongly liked with the functioning Cadastral
Closed shop Minor. Public Sector bodies continue to
of public bodies. information
control the supply chain.
Non-negligible. New entrants step into the Information is important while not strategic for
Battlefield Meteorological data
downstream market. PA.
Strong. Public Sector enlarges its influence Digitalization offers new opportunities for value
Legal information
over the downstream stages. extraction.
Playground
Non-negligible. Public Sector has the only Information reuse generates high demand Traffic and transport
role of information holder. volumes from citizens and firms information
38
39. The price of PSI:
Externalities & Policy
All pricing strategies encompass potential risks of inefficiency
for PSI holders (due to lack of incentives in reducing costs
and/or improving quality).
The importance of the regulatory framework
The Central Role of Externalities
39
41. Linked open data and Semantic web
The Semantic Web isn't just about putting data on
the web. It is about making links, so that a person
or machine can explore the web of data. With
linked data, when you have some of it, you can find
other, related, data. (by Tim Berners-Lee)
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can
discover more things.
Ref: http://www.w3.org/DesignIssues/
LinkedData.html
41
43. Linked open data: basic
principles
1. Everything has a name (people, locations,
etc.)
1. Every name starts with http://
3. All data are described by using RDF
(Resource Description Framework is a W3C
standard).
Tim Berners Lee talk on linked data:
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
43
47. Linked data - hands on
DBPedia provide information of wikipedia as Linked Data.
Example, Turin airport: http://dbpedia.org/page/
Turin_Caselle_Airport
47
49. Open Data license 1 (OKF)
Open Knowledge foundation licences
1. Public Domain Dedication and License (PDDL) —
“Public Domain for data/databases”
2. Open Data Commons Attribution License (ODC-
By) — “Attribution for data/databases”
3. Open Data Commons Open Database License
(ODC-ODbL) — “Attribution Share-Alike for data/
databases”
Ref: http://www.opendatacommons.org/licenses/
49
50. Open Data licenses 2 (CC e IODL)
Creative Commons Licenses (http://creativecommons.org/
licenses/)
1. CC Zero
2. CC by - Atribution
3. CC SA - Share alike
4. CC BY-SA - Attribution and Share alike
Italian open data license (http://www.formez.it/iodl/)
• IODL - Italian Open Data License (BY-SA)
50
60. Where to find open data
Open (and not open) data archive
http://ckan.net/
http://it.ckan.net/
Example of italian datasets:
Dati.gov.it: http://www.dati.gov.it/
5T: http://biennaledemocrazia.it/dataset/
Dati Piemonte: http://dati.piemonte.it
ISTAT: http://dati.istat.it/
Enel: http://data.enel.com/
60
61. Tools and links
ONLINE DATA VISUALIZATION
G visualization Api: http://code.google.com/intl/it-IT/apis/chart/
Tableau Public: http://www.tableausoftware.com/public
Open Heat Map: http://www.openheatmap.com/
ONLINE STORAGE+VISUALIZATION
Google Public Data explorer: http://www.google.com/publicdata/home
IBM Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/
Google Fusion tables: http://www.google.com/fusiontables/Home
Impure: http://www.impure.com/
CURATION & LINKING
Google Refine
Data Wrangler: http://vis.stanford.edu/wrangler/
OFFLINE TOOLS
R: http://www.r-project.org/
Jscript Library for data viz: http://thejit.org/
Anche questa: http://vis.stanford.edu/protovis/
Network / graph analysis / visualization: http://gephi.org/
Language turing complete for dataviz for visual artist: http://processing.org/
61
62. wrap-up
1. Not all public data are open data
2. Public data and gov data are
often “broken” (strange formats
and ambiguous IP)
3. Open Data make sense if we put
it in perspective - the rise of Big
Data
62