Perspectives on Collaborative Research Environments offered by D4Science
D4science-II Codata
1. D4Science:
An e-Infrastructure for Facilitating Fisheries
and Aquaculture Resource Management
Pasquale Pagano
National Research Council of Italy
pasquale.pagano@isti.cnr.it
22nd International CODATA
24-27 October 2010
Cape Town (South Africa)
www.d4science.eu
2. 2
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Assumptions
onsolidated facts:
Very rich applications and data collections are currently maintained by a
multitude of authoritative providers
Different problems require different execution paradigms: batch, map-
reduce, synchronous call, message-queue, …
Key distributed computation technologies exist: grid (gLite and Globus),
distributed resource management (Condor), clusters (Hadoop), …
Several standards are adopted in the same domain
ocietal observations
• A rich variety of protocols, models, and formats
• Create barriers in the usage of resources
• Delay dramatically new exploitation patterns
echnical observations
Protocols, models, and formats heterogeneity increases load,
Load increases failures
3. 3
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
D4Science Vision
4Science objectives:
hide heterogeneity, i.e. abstract over differences in location,
protocol, and model;
embrace heterogeneity, i.e. allow for multiple locations,
protocols, and models;
echnical goals
no bottlenecks: scale no less than the interfaced resources
no outages: keep failures partial and temporary
autonomicity: system reacts and recovers
6. 6
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Infrastructure Exploitation
30 Nodes
• CNR
• NKUA
• ESA
• FAO
• UNIBASEL
25 Data
• EEA
• MERIS
• AATSR
69 Metadata
• es
• ISO19115
• eiDB
15 Data
• AquaMaps
• Fact Sheets
• Country Maps
28 Metadata
• FARM_dc
• aquamaps
Nodes Collections Functionality
29 Nodes
• CNR
• NKUA
• FAO
• UNIBASEL
• Integration
with gPod
• Geographical
and text search
• Search by
metadata
• Personal
workspace
• Objects
annotation
• Report
generation
• Maps
Generation
•Time Series
management
Production
More than 500 autonomic Web Services
7. 7
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Digital Library System is a possibly distributed system that
collects, manages and preserves for the long term rich digital
content, and offers to its user communities specialised
functionality on that content, of measurable quality and according
to codified policies
The Digital Library Reference Model]
he gCube data infrastructure enabling framework provides DL
functionality by:
gCube as a Digital Library System
Federating exiting digital
content
Supporting the generation of
new digital content
Providing discovery and
access capabilities
maintained in a variety of
tailored repository systems
by exploiting heterogeneous
computational platforms
on diversely described and
modeled digital content
8. 8
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
gCube as an e-Infrastructure ecosystem
enabling framework
y bridging a number of well-established systems and
standards from various domains
including high-energy physics, biodiversity, fishery and aquaculture
resources management
Cube realises an
-Infrastructure ecosystem
10. 10
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Why sharing through VREs is a key?
T
hrough the VRE, groups of users have controlled access to
distributed data and services integrated under a
personalised interface.
11. 11
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Why sharing through VREs is a key?
Virtual Research Environment (VRE) supports cooperative
activities
Metadata cleaning, enrichment, and transformation by exploiting
mapping schema, controlled vocabulary, thesauri, and ontology
Processes refinement and show cases implementation (restricted to a
set of users);
Data assessment (required to make data publically exploitable by VO
members);
Expert users validation of products generated through data elaboration
or simulation.
12. 12
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Why sharing through VREs is a key?
REs integrated environment put at disposal a functionality
set to support and perform research activities:
the ability to integrate heterogeneous data and services
the ability to process information on-demand ingesting the
results,
to share data and process with other users,
to customize collection of information,
to store user actions and exploit them for further use,
to aggregate relevant information into ad-hoc information
sources and keeping them updated.
VREs integrated environment put at disposal a functionality
set to support and perform research activities:
the ability to integrate heterogeneous data and services
the ability to process information on-demand ingesting the
results,
to share data and process with other users,
to customize collection of information,
to store user actions and exploit them for further use,
to aggregate relevant information into ad-hoc information
sources and keeping them updated.
19. 19
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
…
Transformatio
n
Storage
VRE Facilities
Tools supporting specific tasksTools supporting specific tasks
A virtual live document to describe
research results
A virtual live document to describe
research results
A virtual desktop to organize the
working environment
A virtual desktop to organize the
working environment
Workspace
Species
Maps Generation
Time Series
Management
Report
Management
Search Annotation
Visualisatio
nSearch Annotation
Visualisatio
nAnnotationSearch
StorageVisualisatio
n
Transformatio
n
Transformatio
n
Storage
20. 20
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Workspace
A collaboration-oriented suite providing for
seamless access and organisation facilities on a rich array of
objects (e.g. Information Objects, Queries, Files, Templates)
mediation between external world objects, systems and
infrastructures (import/export/publishing)
support common file manager (drag & drop, contextual menu)
support an effective rich object sharing facility
21. 21
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
quaMaps is an application*
tailored to predict global distributions of marine species
initially designed for marine mammals and subsequently
generalised to marine species,
that generates color-coded species range maps using a
half-degree latitude and longitude blocks
by interfacing several databases and repository providers
Species Distribution Maps Generation
* Algorithm by Kashner et al. 2006
22. 22
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
quaMaps execution is based on the gCube Ecological Niche
Modelling Suite which allows the extrapolation of known species
occurrences
Species Distribution Maps Generation
◦ to determine environmental
envelopes (species tolerances)
◦ to predict future distributions by
matching species tolerances
against local environmental
conditions (e.g. climate change
and sea pollution)
Very large volume of input and output data: HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360
Very large number of computation: One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species
requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)
23. 23
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Time Series Management
ffers a set of tools to manage capture statistics
Supports the complete TS lifecycle
Supports validation, curation, and analysis
Provides support for data reallocation
Produces uniform data-set
24. 24
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Time Series
ffers a set of tools to operate on capture statistics
Multiple key families support
Filtering, grouping, and aggregation
Union
Mining
Produce automatically provenance information
25. 25
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Time Series
ffers a set of tools to operate on capture statistics
Multiple key families support
Filtering, grouping, and aggregation
Union
Mining
Produce automatically provenance information
26. 26
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Report Management
A collaboration-oriented suite providing for
template-oriented, feature-rich and flexible document format
definition
effective and infrastructure-integrated report compilation (drag &
drop workspace items)
collaborative and distributed editing (workspace based)
standard-based report materialisation (HTML, OpenXML)
28. 28
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
gCube and Humanities: the gMan case
JISC - King’s College London
Look at new ways of integrating existing data resources for Classics and
add services so that research work based on integrated resources can be
published
Data sources
The Heidelberger Gesamtverzeichnis (HGV) der griechischen Papyrusurkunden
Aegyptens, a collection of metadata records for 55,000 Greek papyri from Egypt.
Projet Volterra, a database of Roman legal texts, and associated metadata, from
various sources (epigraphic, papyrological, or literary) currently in the low tens of
thousands but very much in progress.
The Inscriptions of Aphrodisias, (InsAph), a corpus of about 2,000 ancient Greek
inscriptions from the Roman city of Aphrodisias in Asia Minor, including transcribed
texts and metadata marked up using EpiDoc TEI, as well as images of the physical
objects.
Main functionality
cross-collection search
workspace
annotation
report creation
Early results in “AHM 2009 Phil. Trans. A special issue”
29. 29
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
VRE Sumamry
4Science approach:
• Heterogeneous resources are accessible in a common
ecosystem of resources
• despite their locations, technologies, and protocol
• Different communities have access to different views
• according to the conditions under which the sharing can occur
• Each community can define its own virtual research
environment to satisfy specific needs
• for a limited timeframe and at no cost for the providers of the
resource
• Several virtual research environments can coexist
• without interfering each other even by competing for the same
resources
30. 30
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Conclusions
acts
Very rich services and data collections are currently maintained by a
multitude of authoritative providers
Several standards are adopted in the same domain
nteroperability approaches are key to exploit such richness
4Science offers a variety of patterns, tools, and solutions
to interconnect
Heterogeneous digital content
Heterogeneous repository systems
Heterogeneous computation platforms
with a rich set of free-to-use tailored services
to decrease the cost of adoption
to reduce the time to market of new ideas
to deal with plethora of standards
32. 32
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Supported Standards
SRF Specifications
• WS-ResourceProperties (WSRF-RP)
• WS-ResourceLifetime (WSRF-RL)
• WS-ServiceGroup (WSRF-SG)
• WS-BaseFaults (WSRF-BF)
SR
• 168 : Simple Portlets
• 286 : 186 update
• 160 : JMX
SN Specifications:
• WS-BaseNotification
• WS-Topics
• (WS-BrokeredNotification)
• ….
S-* Standards
• SOAP
• WSDL
• WS-Addressing
• ….
SO:
• ISO3166 countries
• ISO4217 currencies
• ISO19115 geo-location
• ….
-*
• XML
• XSD
• XSL
• XSLT
• xPath
• xQuery
GC
• Web Coverage Processing Service
• Web Coverage Service
• Web Feature Service
• Web Map Context
• Web Map Service
• Web Map Tile Service
• Web Processing Service
• Web Service Common
GF Standard:
• Glue Schema (2)
…….
Comply with:
OAI-PMH
OAI-ORE
33. 33
www.d4science.euD4Science
22nd
International CODATA, Cape Town 24-27 October 2010
Find us
www.d4science.eu
Donatella Castelli
D4Science-II Project Director
donatella.castelli@isti.cnr.it
Pasquale Pagano
D4Science-II Technical Director
pasquale.pagano@isti.cnr.it
Thank You For
Your Attention
Notes de l'éditeur
WSRF Specifications
WS-ResourceProperties (WSRF-RP)
WS-ResourceLifetime (WSRF-RL)
WS-ServiceGroup (WSRF-SG)
WS-BaseFaults (WSRF-BF)
JSR
168 : Simple Portlets
286 : 186 update
160 : JMX
WSN Specifications:
WS-BaseNotification
WS-Topics
(WS-BrokeredNotification)
WS-* Standards
SOAP
WSDL
WS-Addressing
ISO:
ISO3166 countries
ISO4217 currencies
ISO9115 geo-location
X-*
XML
XSD
XSL
XSLT
xPath
xQuery
Other
WSRP
OpenGIS
KML
OGF Standard:
Glue Schema (2)
eXtensible Access Control Markup Language(XACML) is a specification in XML for writing access control policies in XML and how to interpret them
Security Assertion Markup Language(SAML) is a XML specification, defining syntax and processing semantics about security assertions