Short presentation on the lifecycle of scientific data and how it relates to the Glastir Monitoring and Evaluation Programme. The GMEP is effectively a "real-time" healthcheck system for the new Welsh agri-environment scheme Glastir.
Processing of scientific data: from field capture to web delivery
1. Processing of scientific data
From field capture to web delivery
Hector Quintero Casanova
Postgraduate in e-Science
2. Why e-Science? Data-intensive
●
GMEP ticks all the boxes:
✔ Highly multidisciplinary: social, landscape, water, birds
plants...
✔ Large volumes of data: covers the whole of Wales.
✔ Cross-organisational collaboration: 13 institutions.
3. Why e-Science? Metadata
●
NERC's data policy says it all
–
●
“It is essential that metadata are submitted”
Metadata = context information about data
–
Provenance = who, when, where, how
●
–
Workflow = how. Essential if using models
●
●
Exposes data relationships → traceability
Enables reproducing outcome → repeatability
Exactly what information depends on the stage.
6. Data analysis
●
Workflow metadata avoids costly reruns
–
●
Identify model output needed → reuse
But not enough for cross-organisation collab.
–
–
●
13 institutions in Glastir.
Differences in storage structure, metadata defs...
Need extra layer(s) for seamless access
–
Web already offers tools needed.
7. Publication: linked data
●
HTTP for generic retrieval of resources
●
URIs for unique identification of those resources
–
●
E.g. http://www.ceh.ac.uk
Both can be used to build web services
–
–
●
Amount to remote functions.
Eg: seamless recording of workflows across institutions.
Semantics for automated reasoning
–
Acts as standardised metadata aimed at machines.