The Pistoia Alliance SESL Project aims to develop standards for biomedical knowledge brokering through a vehicle brokering pilot.
The pilot will develop an assertion database for type 2 diabetes combining data from structured sources and literature. It will establish APIs and a demonstrator interface. The goals are to streamline non-competitive workflows, engage stakeholders, and assess moving to a production "push model" of content delivery.
Assuming technical success, the project seeks input on opportunities for integrating literature into biomedical infrastructure, who benefits most from a push content model, how publication may be impacted, and challenges in further engaging publishers.
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Developing Knowledge Brokering Standards for Biological Text and Data Integration: The Pistoia SESL Project
1. The Pistoia
SESL Project
Developing Knowledgefor Collaboration:
An Emerging Vehicle Brokering Standards
The Pistoia Alliance
for Biological Text and Data Integration
Ian Harrow, Wendy Filsell, Dietrich Rebholz Schuhmann
ALPSP Workshop http://pistoiaalliance.org
11th May 2010
2. Pistoia Background – How it all started
2007 2008 2009 Now
Informal Met in Create Pistoia as Not Official 7 of top 10 Pharma as
meeting Pistoia for profit company Launch members
Stanhope Gate 33 members
Pistoia Domains Established
Lhasa Curzon
Informal Collaborations Collaboration/project meeting
Pistoia Description History
The primary purpose of the Pistoia Alliance is to
Initial Meeting with GSK, AZ,
streamline non-competitive elements of the life Pfizer and Novartis – outlined
science workflow by the specification of common similar challenges and
standards, business terms, relationships and frustrations in the Informatics
sector of Discovery
processes
Pistoia Goals The advent of Web Services and Web2.0 allow for
decoupling of proprietary data from technology
• to allow this framework to encompass/support
most pre-competitive work between the Publicly available structural and biological DBs allow
organisations for a non-IP related analysis and as a scientific test
suite.
• to support life science workflow prior to
submission Sponsorship from R&D IS heads within Life Science
• to work with other Standards organisations industry
3. Pistoia Domains
Pistoia Domains group areas of interest, scope out and deliver projects
Pistoia Domain – high level collection
Pistoia Groups – as of Working Groups with common themes
External
Groups
defined in byelaws Domain Allows governance across outside of
Steering a domain using Working
Pistoia
Board of Groups Group chairs and
Technical Committee reps
Directors Could:
•join Pistoia
Working The main project delivery •influence Pistoia
Working mechanism in Pistoia. All
Officers Groups members
Groups standards will be •influence through
(Operational delivered by WGs other standards
Team) groups and activities
Provide expertise for WGs •Collaborate on
and running Pistoia standards’ feasibility
Technical Pistoia Define: studies
•Requirements •Collaborate through
Committee Members •Technical Standards non-Pistoia
•Service Standards Standards initiatives
4. Pistoia Domains
Pistoia Domains focus on business workflows /supply chains
Enabling Knowledge and Information Services
Vocabulary
Visualisation
Application Integration
Workflow
Others Biology Chemistry Translational
Data Data Data
Services Services Services
5. SESL: Biomedical Knowledge Brokering
• Challenge:
– No single system for retrieving gene to disease relationships contained in
both published & biological database content
– Need a ‘push model’ for biomedical knowledge access: the current model
requires the consumer to search 1000’s of content sources
• Opportunity: Pilot Project with key stakeholders
– Pilot a ‘push model’ for biomedical knowledge brokering
– Engage multiple consumers, content providers and a single, public group to
develop the necessary infrastructure to explore the standards required for
the model to work in production
• History:
– May 2008: Common Disease Knowledge Environment (CDKE) IMI call drafted
– Sep 2008: postponed call publication
– Jan 2009: x-pharma meeting in London on how to progress CDKE
– Apr 2009: CDKE presented at SESL workshop
– Oct 2009: SESL Pilot meeting (funders)
– Jan 2010: Pilot launch
6. The Knowledge Service Framework
Multiple
Consumers
‘Consumer’
Disease Dossier Knowledge
Firewall Applications
Service Layer Std Public Common
Open Assertion & Meta Data Mgmt Vocabularies Service
Stds Transform / Translate Business Broker
Integrator Rules
Supplier
Firewall Content
Suppliers
Db 2
Effort required
Db 4 to fit DBs to
Corpus 1 service layer
Db 3 Corpus 5
6
7. A Production Service ...
Consumer
Side Exemplar
Disease Dossier Application
License
Service Layer Std Public Service Layer Std Public Service Layer Std Public Service Layer Std Public
Vocabularies Vocabularies Vocabularies Vocabularies
Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt
Transform / Translate Business Transform / Translate Business Transform / Translate Business Transform / Translate Business
Rules Rules Rules Rules
Integrator Integrator Integrator Integrator
Broker Org #1 Broker Org #2 Broker Org #3 Internal Broker
License
Corpus 1 Db 3 Corpus 5 Db 7 Corpus 9 Db 11 Corpus 13 Db 15
Corpus 4 Corpus 8 Corpus 12 Corpus 16
Db 2 Db 6 Db 10 Db 14
Supplier
Side
9. The Pilot
• Deliverables:
– Publication of standards & recommendations for service implementation
– Pilot implementation of service for a single disease (assertions from pre-defined
document sets & databases)
– Establish ways of working precompetitively across industry/vendor/academia
– Dialogue and assessment of cost / value, with key content suppliers in moving to
such a push model for content (viability of moving to production)
• Status:
– AZ, Pfizer, GSK, Roche, Unilever, EBI, NPG, OUP, Elsevier & RSC
– 12 month project, £200K direct funding (+ PM & Architecture support)
– Contract between Pistoia & EBI signed 20th January 2010 for 1 year
• Scope:
– Development of an assertion database in combination with a user interface and
associated web services for one disease/indication/phenotype of broad interest:
Type II Diabetes
– Assertional content derived from 3 structured data sources and limited Journal
content (co-occurrence & statistical derivation from full text)
– Assertional evidence for filtering and drill down to primary data.
– Limited vocabulary development for area of focus: Type II Diabetes
10. Timelines: Development Phase
Task/Deliverable Phase Type Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11
Month 0 Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Month 10 Month 11 Month 12
Finalised Technical Specification Deliverable
document (Month 4) 1 ^
Build vocabularies within scope Development Task 2
RDF data export from UniProt Development Task 3
and Ensembl
RDF data export of Array Express Development Task 4
Extract literature assertions for Development Task 6
T2DB from publishers’ content
Develop RDF triple store schema Development Task 7
and demonstrator
Develop query definitions Development Task 8
Establish API services for remote Development Task 9
access
Develop simple user interface for Development Task 10
demonstrator (based on mock-
up)
Write documentation that Development Task 11
defines the standard framework
Access to early prototype Deliverable
demonstrator and report 2&3 ^^
(Month 7 & 8)
11. Timelines:
Testing and Communication Phase
Task/Deliverable Phase Type Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11
Month 0 Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Month 10 Month 11 Month 12
Develop RDF triple store schema Development Task 7
and demonstrator
Develop query definitions Development Task 8
Establish API services for remote Development Task 9
access
Develop simple user interface Development Task 10
for demonstrator (based on
mock-up)
Write documentation that Development Task 11
defines the standard framework
Access to early prototype Deliverable
demonstrator and report 2&3 ^^
(Month 7 & 8)
Tests of the demonstrator (full Testing and Task 12
private and limited public communication
instance)
Deploy publc demonstrator Testing and Task 13
communication
Write publication for standard Testing and Task 14
definition communication
Develop recommendations for Testing and Task 15
post-pilot project communication
Final prototype demonstrator, Deliverable
recommendations post-pilot 4&5 ^ ^
and report (Month 11 & 12)
Public release of limited Deliverable
demonstrator (Month 13) 6 ^
12. Acknowledgements
Industry Content Providers EBI
Ian Dix Claire Bird – OUP Cath Brooksbank
Nick Lynch Richard O’Bierne – OUP Dominic Clark
Ashley George Jabe Wilson – Elsevier Christoph Grabmueller
Mike Westaway Bradley Allen – Elsevier Silvestras Kavaliauskas
Ian Stott Colin Batchelor – RSC Roderigo Lopez
Nigel Wilkinson Richard Kidd – RSC Jo McEntyre
Michael Braxenthaler David Hoole – NPG Janet Thornton
Catherine Marshall Alf Eaton - NGP
14. Questions.....
• Assuming a successful technical outcome from
the SESL experiment by year end...
– What opportunities does SESL bring to you?
– Do you benefit from full integration of the literature
into a biomedical infrastructure?
– Who would gain most from a push model?
– Does the publication process benefit from this new
service model?
– How might it change how you do business?
– What challenges do you foresee?
– How can we reach out further to publishers?