5. 2
RepeatableTransformation
Transformation should be part of routine ...
... manageable and scalable...
... repeatable ...
Linked Data will not be the officialsource anytime soon
http://www.w3.org/TR/prov-overview/
Provenance is key
6. 3
ChooseyourGrainSize
• The document is the
traditionalgrain size
(dublin core)
• Linked data allows for
deeplinks into data
• Cost versus usefulness
• Are you the right party to provide detailed descriptions?
http://creatingandeducating.blogspot.nl/2011/11/blog-post.html
8. 5• Information is notalwayscompatible
• Make explicit in which context the information holds ...
• ... and who stated the information, why and how.
Contextualize!
Flat Earth and Square Earth idea courtesy of SzymonKlarman
24. Definition
(OxfordEnglishDictionary)
• The fact of coming from some particular source or quarter;
origin, derivation;
• the history or pedigree of a work of art, manuscript, rare
book, etc.;
• concretely, arecordofthepassage of an item through its
various owners.
25. Making trust judgements
Liability, trust and privacy
in open government data
Compliance and auditing
of business processes
Licensing and attribution
of combined information
26. Curt Tilmes, Peter Fox, Xiaogang Ma, Deborah L. McGuinness, Ana Pinheiro Privette, Aaron Smith, Anne Waple,
Stephan Zednik, Jinguang Zheng: Provenance Representation for the National Climate Assessment in the Global
Change Information System. IEEE T. Geoscience and Remote Sensing 51(11): 5160-5168 (2013)
Integrated & Summarized Data
Transparency and Trust
“Provenance is the number one
issue that we face when publishing
government data in data.gov.uk”
John Sheridan, UK National Archives, data.gov.uk
27. Provenance?
• Provenance = Metadata?
Provenance can be seen as metadata, but not all metadata is
provenance
• Provenance = Trust?
Provenance provides a substrate for deriving different trust
metrics
• Provenance = Authentication?
Provenance records can be used to verify and authenticate
amongst users
28.
29.
30. ThreeDimensions
• Content
Capturing and representing provenance information
• Management
Storing, querying, and accessing provenance information
• Use
Interpreting and understanding provenance in practice
31. ThreeDimensions
• Content
Capturing and representing provenance information
• Management
Storing, querying, and accessing provenance information
• Use
Interpreting and understanding provenance in practice
recording annotating workflows
32. ThreeDimensions
• Content
Capturing and representing provenance information
• Management
Storing, querying, and accessing provenance information
• Use
Interpreting and understanding provenance in practice
recording annotating workflows
scalability interoperability
33. ThreeDimensions
• Content
Capturing and representing provenance information
• Management
Storing, querying, and accessing provenance information
• Use
Interpreting and understanding provenance in practice
recording annotating workflows
scalability interoperability
trust accountability compliance explanation debugging
35. W3CPROVStandard
Provenance is a record
that describes the people,
institutions, entities, and
activities, involved in producing,
influencing, or delivering a
piece of data
or a thing.
http://www.w3.org/TR/prov-overview
36. Luc Moreau & Paul Groth
W3CPROVStandard
Provenance is a record
that describes the people,
institutions, entities, and
activities, involved in producing,
influencing, or delivering a
piece of data
or a thing.
http://www.w3.org/TR/prov-overview
41. NaiveApproaches
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping
Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
Orbiter has several limitations. It does not have capabilities for query subgraph high-
lighting, regular expression filters, process grouping, annotations, or programmable views[16].
Furthermore, the structure of each summary node, where child nodes are grouped within
parents and are hidden until the parent is expanded, benefits queries earlier in the depen-
dency chain. Initial overviews often correspond with system bootup, and appear very similar
across di↵erent traces (time slices of system activity).
Figure 10: In these screenshots of Orbiter, the presence of edges overwhelms the visibility of
nodes. By relying on a node-link graph layout and using spatial location to encode object
relationships, Orbiter’s graph layout algorithm must draw many long edges to communi-
cate node connections. Without edge bundling or opacity variation, the meanings of these
relationships are obscured.
Another one of Orbiter’s weaknesses is its node-link diagram layout. As a result, each
node’s position in the X-Y plane and the length and angle of connecting lines are wasted
attributes. The chosen graph layout algorithm (dot by default) arranges nodes to minimize
Figure 11: (Top): A screenshot of the portion of the graph generated by GraphViz for a
trace of the third provenance challenge. (Bottom): A zoomed-in view of the same graph.
The horizontal black bars across the images are dense collections of edges.
E↵ective large graph visualizations present the user with a summary view that can be
explored, filtered, and expanded interactively.
2.5 Tree Visualization
While trees are a subcategory of graphs, because of their hierarchical composition, tree visu-
alization forms its own subfield of research. A survey of over two-hundred tree visualizations
is given at Hans-Jrg Schulz’s treevis.net. Visitors can narrow down by dimensionality
(2D, 3D, or mixed), representation (explicit node-link diagram, implicit treemap, or combi-
nation), alignment (XY plot, radial layout, or free diagram)[55]. These categories are shown
Figure 12: Left: Pajek uses various summary node-link and matrix-based representations
depending on the structure of the supplied data set. Pictured is a main core subgraph
extracted from routing data on the Internet. Right: TopoLayout optimizes the choice of
visualization display depending on the underlying graph structure. The right column is
TopoLayout’s output, while the left and middle columns are the outputs of the GRIP and
FM graph layout algorithms.
Figure 13: treevis.net defines di↵erent categories for tree maps. Tree maps can be cate-
gorized by dimensionality (2D, 3D, or mixed), representation (explicit, implicit, or mixed),
or alignment (XY, radial, or spring).
Tree visualizations are either explicit or implicit. Explicit representations resemble node-
link diagrams. An example of an implicit representation is a tree map, a diagram where the
entire tree is inscribed in a rectangle representing the root node. This root is subdivided
hierarchically into more rectangles, which represent child nodes, and each child node is
subdivided into more child nodes. Treemaps are excellent for displaying hierarchical or
categorical data[57]. One famous example, shown in Figure 14, is the “Map of the Market”
from SmartMoney.com, which displays in red and green the changes in market value of
publicly-traded companies, grouped by market sector, with cell size proportional to market
capitalization[64].
TreePlus is an example of a tree-inspired graph visualization tool (Figure 15). It uses
the guiding metaphor of “plant a seed to watch it grow” to summarize navigation of its tree-
42.
43. Width of activities and entities is based on informationflow
Activities and entities are extracted from an egograph
47. We need an intuitive REST-like API to integrated Open
Government data. Dealing with all these different formats
and identifiers is really taking too much time.
I have all this data, and I want to make (part of) it
available for the general public, but haven't a clue how!
Civil Servant
wants to publish data
Application Developers
want to consume data
Carrier
12:00
PM
Page Title
http://
www.domain.com
Googl
e
Apps and applications
Visual interactions with Open Data.
Application specific logics (e.g. 'danger')
CitySDK API
HTTP API to the CitySDK
Returns JSON, Turtle, etc.
(includes the Linked Data API of CitySDK)
SPARQL API
SPARQL Endpoint to the Linked
Data storage of the ODE
Partial Synchronisation
CitySDK Datastores Linked Data Triplestore
Feed into
Query
Orchestrator
Amsterdam Open Data Exchange
HTTP API to `canned queries' across multiple datasets.
Returns JSON-LD, Turtle
Data Integrator
ODE Best Practices
Best practices for publishing Open Data
CitySDK Ingestion Plugins
"Standard" adapters part of CitySDK
ODE Ingestion Adapters
Ingestion adapters developed within
ODE
Municipal Legacy Systems Excel Files
Amsterdam Open Data CKAN
Amsterdam Open Data Catalog
Will point to datasets in the ODE
May provide a direct query interface on top of ODE
Wrapper-based