Scholarly Communication for Bioinformatics Students

The Changing Face of Scholarly Communication and the Opportunities it Affords the Bioinformatics/Systems Biology Student Philip E. Bourne University of California San Diego pbourne@ucsd.edu http://www.sdsc.edu/pb Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011

Observation 1: Everyone in this Room is Driven by One Thing Above All Else

Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other

Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on Publications

Perhaps it is Time We Though Less About a Publication as a Reward and More About How it Can be Presented to Maximize its Use

So What Needs to Happen We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives We need to be more open with both We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery Reward systems need to change We need scientist management tools We need to be less fixated on the big data problems We need to unleash the full power of the Internet Hard Easy

One Personal Example of Why This Needs to Happen Now

Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation http://sagecongress.org/Presentations/Sommer.pdf

Chordoma A rare form of brain cancer No known drugs Treatment – surgical resection followed by intense radiation therapy http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG

http://sagecongress.org/Presentations/Sommer.pdf

If I have seen further it is only by standing on the shoulders of giants Isaac Isaac Newton From Josh’s point of view the climb up just takes too long > 15 years and > $850M to be more precise Adapted: http://sagecongress.org/Presentations/Sommer.pdf

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

So We Have Seem What Needs the Change and Why. What about the How?

We Need Data and Knowledge About That Data to Interoperate The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB User clicks on content Metadata and webservices to data provide an interactiveview that can be annotated Selecting features provides a data/knowledge mashup Analysis leads to new content I can share 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed PLoS Comp. Biol. 2005 1(3) e34

We Need Data and Knowledge About That Data to Interoperate – What is Stopping US? Open Access Governance – publishers vs. database providers Reward Metadata standards for provenance, privacy etc. Exemplars ….

A Small Example - The World Wide Protein Data Bank The single worldwide repository for data on the structure of biological macromolecules Vital for drug discovery and the life sciences 39 years old Free to all http://www.wwpdb.org We need data and knowledge about that data to interoperate PLoS Comp. Biol. 2005 1(3) e34

The World Wide Protein Data Bank – The Best Case Scenario Paper not published unless data are deposited – strong data to literature correspondence Highly structured data conforming to an extensive ontology DOI’s assigned to every structure http://www.wwpdb.org We need data and knowledge about that data to interoperate PLoS Comp. Biol. 2005 1(3) e34

Example Interoperability: The Database View www.rcsb.org/pdb/explore/literature.do?structureId=1TIM We need data and knowledge about that data to interoperate BMC Bioinformatics 2010 11:220

Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate

ICTP Trieste, December 10, 2007 We need data and knowledge about that data to interoperate

Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673 We need data and knowledge about that data to interoperate

Semantic Tagging of Database Content in The Literature or Elsewhere http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp PLoS Comp. Biol. 6(2) e1000673 Semantic Tagging

We need data and knowledge about that data to interoperate

The Publishers are Starting to Do It From Anita de Waard, Elsevier

This is Literature Post-processingBetter to Get the Authors Involved Authors are the absolute experts on the content More effective distribution of labor Add metadata before the article enters the publishing process We need data and knowledge about that data to interoperate

Word 2007 Add-in for authors Allows authors to add metadata as they write, before they submit the manuscript Authors are assisted by automated term recognition OBO ontologies Database IDs Metadata are embedded directly into the manuscript document via XML tags, OOXML format Open Machine-readable Open source, Microsoft Public License http://www.codeplex.com/ucsdbiolit We need data and knowledge about that data to interoperate

Challenges Authors Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on Publishers Carrot Competitive advantage We need data and knowledge about that data to interoperate

The Promise – A Hypothetical Example Cardiac Disease Literature Immunology Literature Shared Function We need data and knowledge about that data to interoperate

High-throughput Biology Requires High-throughput Knowledge Discovery Consider an Example from Our Own Work… Roger Chang Will Give You Another Example

The TB-Drugome Determine the TB structural proteome Determine all known drug binding sites from the PDB Determine which of the sites found in 2 exist in 1 Call the result the TB-drugome Kinnings et al 2010 PLoS Comp Biol6(11): e1000976 High-throughput Data Requires High-throughput Knowledge

1. Determine the TB Structural Proteome TB proteome homology models solved structures 2, 266 3, 996 284 1, 446 High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3% Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

2. Determine all Known Drug Binding Sites in the PDB Searched the PDB for protein crystal structures bound with FDA-approved drugs 268 drugs bound in a total of 931 binding sites No. of drugs Acarbose Darunavir Alitretinoin Conjugated estrogens Chenodiol Methotrexate No. of drug binding sites Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Map 2 onto 1 – The TB-Drugome http://funsite.sdsc.edu/drugome/TB/ Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).

From a Drug Repositioning Perspective Similarities between drug binding sites and TB proteins are found for 61/268 drugs 41 of these drugs could potentially inhibit more than one TB protein conjugated estrogens & methotrexate No. of drugs chenodiol levothyroxine testosterone raloxifene alitretinoin ritonavir No. of potential TB targets Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Top 5 Most Highly Connected Drugs

We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of PubMed Abstracts – Its About Changing the System Our Future is in Your Hands!

Acknowledgements BioLit Team Lynn Fink Parker Williams Marco Martinez RahulChandran Greg Quinn Microsoft Scholarly Communications Pablo Fernicola Lee Dirks SavasParastitidas Alex Wade Tony Hey RCSB PDB team Andreas Prilc DimitrisDimitropoulos TB Drugome Team Lei Xie Sarah Kinnings Li Xie http://funsite.sdsc.edu/drugome/TB/ http://biolit.ucsd.edu http//www.pdb.org http://www.codeplex.com/ucsdbiolit

Scholarly Communication for Bioinformatics Students

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (10)

Similaire à Scholarly Communication for Bioinformatics Students

Similaire à Scholarly Communication for Bioinformatics Students (20)

Plus de Philip Bourne

Plus de Philip Bourne (20)

Scholarly Communication for Bioinformatics Students

Notes de l'éditeur