BioCatalogue talk by Carole Goble. She outlines in these slides the reasons behind the BioCatalogue project. And present the BioCatalogue and its goals.
6. Service and Workflow analytics and network analysis Recommendations and co-use. Social networks of third party externally hosted services Automated diagnostics, monitoring and metadata curation
7. Finding and Curating Services http://www.biocatalogue.org Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies. Drawing on experience at EBI in service provision. First pilot early November 2008, will cover major providers (EBI, NCBI, DDBJ) at “bronze” quality and show some at platinum.
8.
9.
10.
11. Workflows and Services Curation by Experts Social Curation by the Crowd refine validate refine validate Self-Curation by Contributors seed seed refine validate seed refine validate seed Automated Curation
12. Multiple Annotation Profiles User Profile Service Profile Profile Annotation Profile Annotation Profile Annotation Ranking Functions Group Profile
13. Service Profile Curation Model Quantitative Content Tags Service Model Semantic Content Model Ontologies Functional Provenance Operational Operational Metrics Conditions of Use Social Standing 6 facets Versioning QoS Usage
14. A.N. Other Execution at Host Service Profile Finding WSDL WADL S-A.N. Other SAWSDL SA-REST Analytics Ranking Browse/Shop Search Customised Services Workflows Monitoring Profiles Curation Quant’ve Service Model Semantic Content Model
15. Service Profile Facets Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance
16. Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ontologies Controlled Vocabularies Tags Free text Folksonomies Standards W*DL Atom Schemas
17. Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ranking
18. Pay as you Go, Emergent Curation Just enough, Just in Time, not Just in Case. What is the Return for the Investment? Gain Pain Very BAD Good, but Unlikely Just right Folksonomy Tagging Hard Core full on Ontology Curation Rich enough metadata for effective reuse
31. Finding, curating and reusing workflows Connecting Scientists in the Wild A supermarket for workflow users. A toolbox for workflow creators. Social networking over commodities. Different disciplines. 1200+ members from 114 countries. 50000+ workflows downloads. 1500-2000 unique visitors / month 460+ workflows. 98 groups. 35+ packs. Running for just over a year. Joint Manchester and Southampton. Project leader: Prof David De Roure
32.
33.
34.
35. BioCatalogue Team Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Rodrigo Lopez Eric Nzuobontane
The plan for this talk was to highlight what BioCatalogue is and to Give a demo but unfortunately can’t do it because not ready. But will use some screen shot to show you what really going on or what to Expect next from BioCatalogue. Background of the talk: Lots of database and data resources Feta but can’t annotate all the services BioCatalogue
Services are methods too.
Fix, File and Forget is curation in a way….. Assets are used, we hope By applications and scientists who had anticipated using them. By applications and scientists that had not, or in ways that were unanticipated.
Of course it isn’t as clean as that. And highly interrelated.
Workflows are combinations of services. External Not self-contained or isolated Service and Workflow analytics and network analysis Service Diagnostics and monitoring Automated curation
Get service providers involved, get the community involved 3500+ service operations, but only 700ish annotated in Feta. myGrid Service Ontology Annotation and curation pipeline Curation and Discovery tools Other registries: DAS Registry, BioMOBY Central, SeekDa …
Scientists are naughty Reuse is Hard We have to try them to find out what they do… IVOA referred to this too. … I used it last time so it will work again the same way…damn! change location, capabilities and signatures (BioMART changed its interface three times in 2006). new ones appear and existing ones disappear (SeqHound) they decay and become outdated or unreliable.
Services in the Wild are frequently, er, disappointing and hard to use. (Rubbish ™) . Writing reusable workflows is hard. Local services Permissions. Licences What does it DO? Writing reusable services is hard. What does it DO? Predicting the unknown required by the unknown. Finding workflows, services and tools is hard Where do you go?? What does it DO?? Creating web services is still a bottleneck. For quick solutions it is still seen as too much extra trouble.
Ruin Not fix, file, forget Services are not deposited and preserved in software libraries. Rapid metadata heart-beat, especially on operational metadata. Could use previous slide in DCC talk. Shadows Method archives Shadows – what it was that can be used again. They are referred to. No SLA to be stable or standard. Constantly need tending or else they go stale. (cf. IVOA service validation, DAS). Not software libraries BioNanny – using Grid tools Versioning of workflows – Andrea. Regular health checks Use myExperiment to notify scientists with potential problems Use myExperiment to be smart about which services should be monitored. Workflows are deposited but…. Not self-contained. Linking to external services in flux. Or depend on software Incorporating services unavailable to others. Workflow fragility and hence decay. Workflows become plans and provenance rather than working scientific objects unless tended and updated.
In particular a platform for research into curation practices As in the panel today Expert – Is library like Suppliers and crowd are the web side Automated is
Group profile is the interrelationships between the services. Co-reference, Co-use,
Curation includes versioning Analytics includes monitoring
OAIS? From the model point of view. From the standoff annotation point of view. Metadata richness.
Skipped all but the core in talk. OAIS? From the model point of view. From the standoff annotation point of view. Metadata richness.
From the model point of view. From the standoff model neutral annotation point of view Bronze, silver, gold and platinum compliance levels.
Frankly, is it worth it to do the detailed stuff?
Richness spectrum Spoke to it but probably should have skipped The quality and completeness of metadata – graceful decay Platinum to bronz Semantic Web services IVOA talk asked – “why and when Semantics”. Here is an answer. Leads to multiple pipelines and multiple Scientist - Finding Simple classifications on a few properties. Simple queries, reduce search space, final decision with user Biological terms. Heavy use of provenance, reputation, usage patterns, operational properties, example configurations and boring stuff like that. Think Amazon. The interface is the thing. Automation – Validation and Execution Rich metadata for automatic service configuration, invocation and fault management Rich descriptions for reasoning: mismatches, debugging, repair Rich descriptions for reasoning: automated composition Hard and time-consuming
Joint project Manchester-EBI
Technical Infrastructure But its still not all joined up!! Feta keeps coming and going. Grid service descriptions are produced by annotating services with terms from the myGrid ontology, stored in a central registry, GRIMOIRES. Services are found using the Feta discovery service [5]. We have piloted expert manual annotation tools augmented by automated tools using information extraction techniques.
These ae not our scientists or our projects. We have none. Its just scientists in the wild. 50% usa and uk Google analytics says: 1931 uniq visitors for 3rd sept to 3rd oct 1698 uniq visitors for 3rd aug to 2nd sept myExperiment currently has 1203 users , 98 groups , 460 workflows , 130 files and 36 packs Extreme Web 2.0 18 months old Built on Ruby on Rails BSD License Source code hosted on RubyForge Publicly available 2 core developers 50% in Southampton, 50% in Manchester User driven design and development 959 active users 1429 unique IP visits in last month 82 groups 248 group memberships 296 workflow entries, 425 workflow versions 101 files 1382 taggings 46,427 downloads 77,393 viewings 408 creditations 12 packs (with 237 total entries)
Towards repeatable, reproducible, comparable and reusable research