The document discusses using data repositories to archive and provide access to research data on the web. It recommends expressing data as described resources with URIs and selecting vocabularies. Repositories will take care of serving the data and metadata, ensuring long-term access. Formats should be translated using content negotiation so everyone can access data in their preferred format. Next-generation archives will publish linked open metadata and facilitate moving data between repositories.
1. Data Archiving and Networked Services
Digital Archiving 3.0
“My data open on the Web, ok but how ?”
Christophe Guéret (@cgueret)
Open Data on the Web, 23 - 24 April 2013
DANS is een instituut van KNAW en NWO
2. A bit of context
http://cedar-project.nl
http://easy.dans.knaw.nl
3. Put your data open on the Web!
“E-Data & Research”, October 2011
“Sharing knowledge: EC-funded projects on scientific information in the digital age”
4. Where is your research data ?
Just get it from the web site
of the research project
I think I have have it somewhere
on a stick, let me check...
It is available as an RDF/XML
dump on my test server
5. All bad answers, really.
●
We need research data to be
– Accessible/readable/usable by anyone
– Available in many (>1) years from now
– With traceable provenance and usages
●
Dumping the data on a web site
somewhere is not enough
6. Solution: use a repository
“Sharing knowledge: EC-funded projects on scientific information in the digital age”
●
Data repositories will take over serving
the data and have a page for it!
●
Repository hold two type of data
– The data stored
– The meta-data about this data
7. Which format for meta-data ?
●
LOD is a perfect fit for describing data
– Use to refer to and link data items
– Facilitates discovery, easy to crawl/index
– One description per data item stored
– Redirects to actual location of the data
●
Remaining question: how much meta-data
is needed?
8. Which format for the data?
●
Many formats around : PDF, SDF, DSPL,
XLS, RDF, CSV, SHP, JSON-LD, ...
●
Translation will imply some extra work for
the data owner and not please everyone
9. Which format for the data?
●
Many formats around : PDF, SDF, DSPL,
XLS, RDF, CSV, SHP, JSON-LD, ...
●
Translation will imply some extra work for
the data owner and not please everyone
Express your data as Buy a DN, decide on a Select vocabularies to
described resources URI scheme for your data describe your resources
10. Just get the
●
data in the
Solution: use a repository
repository
●
Repositories
will take care
●
Data repositories will take over everything
of serving
your data
●
PS: forget
about HTTP
URIs for data
11. Format evolution
●
Use Content-negotiation to translate and
serve different data formats
●
Ensure everyone gets the format he wants
12. Format evolution
●
Use Content-negotiation to translate and
serve different data formats
●
Ensure everyone gets the format he wants
?
?
13. Next generation archives
●
Provide long term access to data in
several formats
●
Publish Linked Open Meta-Data about the
data stored (DCAT, ...)
●
Facilitate moving data around archives
(LDP, ...)