1. Biodiversity Virtual e-Laboratory
An e-Infrastructure and e-Science environment supporting research
on biodiversity
WEB SERVICES INFRASTRUCTURES
FOR BIODIVERSITY SCIENCE
Alex Hardisty
Coordinator, Cardiff University
EUDAT User Forum, 11-12th March 2013, London
2.
3. Products are “services” and “workflows”
• Workflows allow to process vast
amounts of data, repeatedly
– Build your own workflow: select and
apply successive “services” (data
analysis and processing steps)
– Import data from one’s own research
and/or from existing libraries (i.e. GBIF,
Catalogue of Life)
• Access a library of workflows and
re-use existing workflows.
– Improves efficiency by reducing Part of a workflow to study the
ecological niche of the horseshoe crab
research time and overhead expenses
4. Creates powerful data processing tools Ecological Niche Modelling
Biogeochemical modelling
for biodiversity research Metagenomics
• Carbon Sequestration Phylogenetics
Population Modelling
• Ecosystem Functioning and Valuation Taxonomy
Geospatial Visualization
• Invasive Species Management
An international virtual network of experts connecting
2 scientific communities: biodiversity and ICT
• Aims to foster cooperation in the community by:
– Discussing scientific use cases
– Identifying and deploying important Web Services
– Designing and offering workflows
– Training scientists
5. Supported by
many friends
Fits into a portfolio
of initiatives
• NoE: ALTER-Net, EDIT/PESI, LTER-Europe, EuroMarine, etc.
• Projects: 4D4Life, agINFRA, Aquamaps, ArtDataBanken,
BioFresh, Envri, EU BON, EUBrazilOpenBio, Fauna Iberica,
i4Life, iMarine, Micro B3, OpenPlantBio, ViBRANT
• Global: CAMERA, Catalogue of Life, COOPEUS, CReATIVE-B,
EoL, GBIF, GSC Biodiversity WG, TreeBase, and many more
Important contribution
to infrastructure
6. BioVeL Tool Spectrum
Workflow design, compute Concept Knowledge Domain science
Technical Science Domain
PAL PAL Scientist
Taverna Component Taverna Domain-Specific
Workbench Builder Lite / Server Website
(Taverna Player)
High Workflow Visibility Low
7. Biodiversity
Catalogues & Catalogue
Workflows
Repositories Components Services
BioCatalogue
Curators
Pro In the
Interfaces
Makers Field
Design & Launch
Users Third Party
tools Taverna Lite
Channels
Workbench
Local Public BioVeL
Services
Data Mgt
Servers
COTS Shim
Local
Taverna Server File
Data Mgt
Run time Stores
Workspace
Execution Local
Authentication
Data
Management
Sets Domain
Server Interaction Server System
Deployment
Infrastructure
Cloud
hosting, compute, storage
8. We’re at the halfway point
• Several workflows maturing nicely
– Public Shared: Data refinement, Population modelling, Ecol. niche modelling
– Beta: Phylogenetic inferencing
– In the pipe: Biogeochemical process modelling, metagenomics, …
• Using Web services from GBIF, CoL, CRIA, Fraunhofer, INFN, ….
– Developing new services: viz and data selection, phylo, metagenomics,
Biome-BGC modelling, pop modelling
• A curated public catalogue of Web services
– www.biodiversitycatalogue.org
• AWS cloud infrastructure, new user interfaces (tavlite1.biovel.eu)
• Growing profile in community
– Steady enquiries from potential users and public training workshops
9. 4 questions to address
1. How to use distributed centres to efficiently run
distributed processing chains?
2. Is there a problem of data exchange?
(And how to solve this)
3. Deploying codes close to data
4. Access and security issues around managing
protected services
10. How to use distributed centres to efficiently
run distributed processing chains?
Users’ workflows and
applications
Service and Data Providers
(INFN, BioVeL, GBIF, CoL,
EBI, BGBM, etc.)
Resource Providers
(EUDAT, EGI.eu, PRACE,
commercial cloud, etc.)
11. Is there a problem of data exchange?
(And how to solve this)
• At simplest level, we need for the user:
– A "starting place", where a workflow can find the data it needs
– An "ending place", where a workflow can put its results
– A "transient place" where temporary data / intermediate results can be
put and retrieved
• For services we need:
– Temporary spaces associated with specific services, supporting data
movements between services
– Separation of users and separation of workflow runs
• Summarise as :
– A replicated distributed storage space, accessible to BioVeL services,
(hence workflows) for both reading and writing; which presents to the
user as a filespace, native to the user’s local environment.
• = Dropbox for services, with fast replication between known service
locations. Today, typically GB not TB
12. Deploying codes close to data
• BioVeL Appliance
– A service packaged for DCI, deployed on-demand
– Working with EGI Fedcloud on this
– Could be deployed close to data but this only makes sense
if this would be quicker than moving the data
• So where is the break-even point?
• Taverna Server deployments
– In connection with Web Services hosting Taverna Server
13. Access and security issues around
managing protected services
• We need a lightweight and standard solution for
– User management & single sign-on to our Service Network
– Permissions system for authorizing access to services
• Same for Workspace Access Service (user workspace)
User
Contract
SP
Contract
RP
14. Access and security issues around
managing protected services
• We need a lightweight and standard solution for
– User management & single sign-on to our Service Network
– Permissions system for authorizing access to services
• Same for Workspace Access Service (user workspace)
• 3-legged OAuth, extended
– resource / service is
independent of BioVeL
OAuth provider
• Adopt from megx.net
– marine ecological
genomics
15. Questions?
BioVeL is funded by the
European Commission
7th Framework Programme (FP7).
It is part of its e-Infrastructures activity.
BioVeL contributes to LifeWatch and GEO BON.
BioVeL products are free to access.
Under FP7, the e-Infrastructures activity is part of the Research Infrastructures programme,
funded under the FP7 'Capacities' Specific Programme. It focuses on the further development
and evolution of the high-capacity and high-performance communication network (GÉANT),
distributed computing infrastructures (grids and clouds), supercomputer infrastructures,
simulation software, scientific data infrastructures, e-Science services as well as on the adoption
of e-Infrastructures by user communities.