SlideShare une entreprise Scribd logo
1  sur  42
The Changing Face of Scholarly Communication and the Opportunities it Affords the Bioinformatics/Systems Biology Student Philip E. Bourne University of California San Diego pbourne@ucsd.edu http://www.sdsc.edu/pb Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011
Observation 1: Everyone in this Room is Driven by One Thing Above All Else
Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other
Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on Publications
Perhaps it is Time We Though Less About a Publication as a Reward and More About How it Can be Presented to Maximize its Use
So What Needs to Happen We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives We need to be more open with both We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery Reward systems need to change We need scientist management tools We need to be less fixated on the big data problems We need to unleash the full power of the Internet Hard Easy
One Personal Example of Why This Needs to Happen Now
Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation http://sagecongress.org/Presentations/Sommer.pdf
Chordoma A rare form of brain cancer No known drugs Treatment – surgical resection followed by intense radiation therapy http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
If I have seen further it is only by standing on the shoulders of giants Isaac Isaac Newton From Josh’s point of view the climb  up just takes too long > 15 years and > $850M to be  more precise Adapted: http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
So We Have Seem What Needs the Change and Why. What about the How?
We Need Data and Knowledge About That Data to Interoperate The Knowledge and Data Cycle 0. Full text of PLoS papers stored  in a database 4. The composite view has links to pertinent blocks  of literature text and back to the PDB User clicks on content Metadata and webservices to data provide an interactiveview that can be annotated Selecting features provides a data/knowledge mashup Analysis leads to new content I can share 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures  from the paper 3. 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed PLoS Comp. Biol. 2005 1(3) e34
We Need Data and Knowledge About That Data to Interoperate – What is Stopping US? Open Access Governance – publishers vs. database providers Reward Metadata standards for provenance, privacy etc. Exemplars  ….
A Small Example - The World Wide Protein Data Bank The single worldwide repository for data on the structure of biological macromolecules Vital for drug discovery and the life sciences 39 years old Free to all http://www.wwpdb.org We need data and knowledge about that data to interoperate PLoS Comp. Biol. 2005 1(3) e34
The World Wide Protein Data Bank – The Best Case Scenario Paper not published unless data are deposited – strong data to literature correspondence Highly structured data conforming to an extensive ontology DOI’s assigned to every structure http://www.wwpdb.org	 We need data and knowledge about that data to interoperate PLoS Comp. Biol. 2005 1(3) e34
Example Interoperability: The Database View www.rcsb.org/pdb/explore/literature.do?structureId=1TIM We need data and knowledge about that data to interoperate BMC Bioinformatics 2010 11:220
Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate
ICTP Trieste, December 10, 2007 We need data and knowledge about that data to interoperate
Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much Will Widgets and Semantic Tagging Change Computational Biology?  PLoS Comp. Biol. 6(2) e1000673 We need data and knowledge about that data to interoperate
Semantic Tagging of Database Content in The Literature or Elsewhere http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp PLoS Comp. Biol. 6(2) e1000673 Semantic Tagging
We need data and knowledge about that data to interoperate
The Publishers are Starting to Do It From Anita de Waard, Elsevier
This is Literature Post-processingBetter to Get the Authors Involved Authors are the absolute experts on the content More effective distribution of labor Add metadata before the article enters the publishing process We need data and knowledge about that data to interoperate
Word 2007 Add-in for authors Allows authors to add metadata as they write, before they submit the manuscript Authors are assisted by automated term recognition OBO ontologies Database IDs Metadata are embedded directly into the manuscript document via XML tags, OOXML format Open Machine-readable Open source, Microsoft Public License http://www.codeplex.com/ucsdbiolit We need data and knowledge about that data to interoperate
Challenges Authors  Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on Publishers Carrot Competitive advantage We need data and knowledge about that data to interoperate
The Promise – A Hypothetical Example Cardiac Disease Literature Immunology Literature Shared Function We need data and knowledge about that data to interoperate
High-throughput Biology Requires High-throughput Knowledge Discovery Consider an Example from Our Own Work… Roger Chang Will Give You Another Example
The TB-Drugome Determine the TB structural proteome Determine all known drug binding sites from the PDB Determine which of the sites found in 2 exist in 1 Call the result the TB-drugome Kinnings et al 2010 PLoS Comp Biol6(11): e1000976 High-throughput Data Requires High-throughput Knowledge
1. Determine the TB Structural Proteome TB proteome homology models solved structures 2, 266 3, 996 284 1, 446 High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3% Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
2. Determine all Known Drug Binding Sites in the PDB Searched the PDB for protein crystal structures bound with FDA-approved drugs 268 drugs bound in a total of 931 binding sites No. of drugs Acarbose Darunavir Alitretinoin Conjugated estrogens Chenodiol Methotrexate No. of drug binding sites Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Map 2 onto 1 – The TB-Drugome http://funsite.sdsc.edu/drugome/TB/ Similarities between the binding sites of M.tb proteins (blue),  and binding sites containing approved drugs (red).
From a Drug Repositioning Perspective Similarities between drug binding sites and TB proteins are found for 61/268 drugs 41 of these drugs could potentially inhibit more than one TB protein conjugated estrogens & methotrexate No. of drugs chenodiol levothyroxine testosterone raloxifene alitretinoin ritonavir No. of potential TB targets Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Top 5 Most Highly Connected Drugs
We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of PubMed Abstracts – Its About Changing the System Our Future is in Your Hands!
Acknowledgements BioLit Team Lynn Fink Parker Williams Marco Martinez RahulChandran Greg Quinn Microsoft Scholarly Communications Pablo Fernicola Lee Dirks SavasParastitidas Alex Wade Tony Hey RCSB PDB team Andreas Prilc DimitrisDimitropoulos TB Drugome Team Lei Xie Sarah Kinnings Li Xie http://funsite.sdsc.edu/drugome/TB/ http://biolit.ucsd.edu http//www.pdb.org http://www.codeplex.com/ucsdbiolit
pbourne@ucsd.edu Questions?

Contenu connexe

Tendances

Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 

Tendances (20)

Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
G5234552
G5234552G5234552
G5234552
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChem
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Giving researchers credit for data
Giving researchers credit for dataGiving researchers credit for data
Giving researchers credit for data
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Pairing Big Data in Bite Size Morsels
Pairing Big Data in Bite Size MorselsPairing Big Data in Bite Size Morsels
Pairing Big Data in Bite Size Morsels
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
 
Smit-Scrap supplementary material-nfdp13
Smit-Scrap supplementary material-nfdp13Smit-Scrap supplementary material-nfdp13
Smit-Scrap supplementary material-nfdp13
 
Use text mining method to support criminal case judgment
Use text mining method to support criminal case judgmentUse text mining method to support criminal case judgment
Use text mining method to support criminal case judgment
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 

En vedette

Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
Lars Juhl Jensen
 

En vedette (10)

Bio find at httpsfc.ece.gatech.edutig.html
Bio find at httpsfc.ece.gatech.edutig.htmlBio find at httpsfc.ece.gatech.edutig.html
Bio find at httpsfc.ece.gatech.edutig.html
 
Model Management in Systems Biology: Challenges – Approaches – Solutions
Model Management in Systems Biology: Challenges – Approaches – SolutionsModel Management in Systems Biology: Challenges – Approaches – Solutions
Model Management in Systems Biology: Challenges – Approaches – Solutions
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 
Bioengineering: making life from scratch
Bioengineering: making life from scratchBioengineering: making life from scratch
Bioengineering: making life from scratch
 
Bioengineering custom microbes, genetic engineering,bioremediation,bioprocess...
Bioengineering custom microbes, genetic engineering,bioremediation,bioprocess...Bioengineering custom microbes, genetic engineering,bioremediation,bioprocess...
Bioengineering custom microbes, genetic engineering,bioremediation,bioprocess...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Similaire à Scholarly Communication for Bioinformatics Students

Similaire à Scholarly Communication for Bioinformatics Students (20)

Cartegena051811
Cartegena051811Cartegena051811
Cartegena051811
 
Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011
 
One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific Publishers
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111
 
Murpha11
Murpha11Murpha11
Murpha11
 
Jim Gray Award Lecture
Jim Gray Award LectureJim Gray Award Lecture
Jim Gray Award Lecture
 
Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication Repositories
 
Faster R & D Analysis Tool - TRG
Faster R & D Analysis Tool - TRG Faster R & D Analysis Tool - TRG
Faster R & D Analysis Tool - TRG
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
UCSD Library Presentation 10182010
UCSD Library Presentation 10182010UCSD Library Presentation 10182010
UCSD Library Presentation 10182010
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
Using OA Content
Using OA ContentUsing OA Content
Using OA Content
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
There is No Intelligent Life Down Here
There is No Intelligent Life Down HereThere is No Intelligent Life Down Here
There is No Intelligent Life Down Here
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 

Plus de Philip Bourne

Plus de Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 

Scholarly Communication for Bioinformatics Students

  • 1. The Changing Face of Scholarly Communication and the Opportunities it Affords the Bioinformatics/Systems Biology Student Philip E. Bourne University of California San Diego pbourne@ucsd.edu http://www.sdsc.edu/pb Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011
  • 2. Observation 1: Everyone in this Room is Driven by One Thing Above All Else
  • 3. Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other
  • 4. Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on Publications
  • 5. Perhaps it is Time We Though Less About a Publication as a Reward and More About How it Can be Presented to Maximize its Use
  • 6. So What Needs to Happen We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives We need to be more open with both We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery Reward systems need to change We need scientist management tools We need to be less fixated on the big data problems We need to unleash the full power of the Internet Hard Easy
  • 7. One Personal Example of Why This Needs to Happen Now
  • 8. Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation http://sagecongress.org/Presentations/Sommer.pdf
  • 9. Chordoma A rare form of brain cancer No known drugs Treatment – surgical resection followed by intense radiation therapy http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
  • 13. If I have seen further it is only by standing on the shoulders of giants Isaac Isaac Newton From Josh’s point of view the climb up just takes too long > 15 years and > $850M to be more precise Adapted: http://sagecongress.org/Presentations/Sommer.pdf
  • 17. So We Have Seem What Needs the Change and Why. What about the How?
  • 18. We Need Data and Knowledge About That Data to Interoperate The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB User clicks on content Metadata and webservices to data provide an interactiveview that can be annotated Selecting features provides a data/knowledge mashup Analysis leads to new content I can share 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed PLoS Comp. Biol. 2005 1(3) e34
  • 19. We Need Data and Knowledge About That Data to Interoperate – What is Stopping US? Open Access Governance – publishers vs. database providers Reward Metadata standards for provenance, privacy etc. Exemplars ….
  • 20. A Small Example - The World Wide Protein Data Bank The single worldwide repository for data on the structure of biological macromolecules Vital for drug discovery and the life sciences 39 years old Free to all http://www.wwpdb.org We need data and knowledge about that data to interoperate PLoS Comp. Biol. 2005 1(3) e34
  • 21. The World Wide Protein Data Bank – The Best Case Scenario Paper not published unless data are deposited – strong data to literature correspondence Highly structured data conforming to an extensive ontology DOI’s assigned to every structure http://www.wwpdb.org We need data and knowledge about that data to interoperate PLoS Comp. Biol. 2005 1(3) e34
  • 22. Example Interoperability: The Database View www.rcsb.org/pdb/explore/literature.do?structureId=1TIM We need data and knowledge about that data to interoperate BMC Bioinformatics 2010 11:220
  • 23. Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate
  • 24. ICTP Trieste, December 10, 2007 We need data and knowledge about that data to interoperate
  • 25. Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673 We need data and knowledge about that data to interoperate
  • 26. Semantic Tagging of Database Content in The Literature or Elsewhere http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp PLoS Comp. Biol. 6(2) e1000673 Semantic Tagging
  • 27. We need data and knowledge about that data to interoperate
  • 28. The Publishers are Starting to Do It From Anita de Waard, Elsevier
  • 29. This is Literature Post-processingBetter to Get the Authors Involved Authors are the absolute experts on the content More effective distribution of labor Add metadata before the article enters the publishing process We need data and knowledge about that data to interoperate
  • 30. Word 2007 Add-in for authors Allows authors to add metadata as they write, before they submit the manuscript Authors are assisted by automated term recognition OBO ontologies Database IDs Metadata are embedded directly into the manuscript document via XML tags, OOXML format Open Machine-readable Open source, Microsoft Public License http://www.codeplex.com/ucsdbiolit We need data and knowledge about that data to interoperate
  • 31. Challenges Authors Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on Publishers Carrot Competitive advantage We need data and knowledge about that data to interoperate
  • 32. The Promise – A Hypothetical Example Cardiac Disease Literature Immunology Literature Shared Function We need data and knowledge about that data to interoperate
  • 33. High-throughput Biology Requires High-throughput Knowledge Discovery Consider an Example from Our Own Work… Roger Chang Will Give You Another Example
  • 34. The TB-Drugome Determine the TB structural proteome Determine all known drug binding sites from the PDB Determine which of the sites found in 2 exist in 1 Call the result the TB-drugome Kinnings et al 2010 PLoS Comp Biol6(11): e1000976 High-throughput Data Requires High-throughput Knowledge
  • 35. 1. Determine the TB Structural Proteome TB proteome homology models solved structures 2, 266 3, 996 284 1, 446 High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3% Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
  • 36. 2. Determine all Known Drug Binding Sites in the PDB Searched the PDB for protein crystal structures bound with FDA-approved drugs 268 drugs bound in a total of 931 binding sites No. of drugs Acarbose Darunavir Alitretinoin Conjugated estrogens Chenodiol Methotrexate No. of drug binding sites Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
  • 37. Map 2 onto 1 – The TB-Drugome http://funsite.sdsc.edu/drugome/TB/ Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).
  • 38. From a Drug Repositioning Perspective Similarities between drug binding sites and TB proteins are found for 61/268 drugs 41 of these drugs could potentially inhibit more than one TB protein conjugated estrogens & methotrexate No. of drugs chenodiol levothyroxine testosterone raloxifene alitretinoin ritonavir No. of potential TB targets Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
  • 39. Top 5 Most Highly Connected Drugs
  • 40. We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of PubMed Abstracts – Its About Changing the System Our Future is in Your Hands!
  • 41. Acknowledgements BioLit Team Lynn Fink Parker Williams Marco Martinez RahulChandran Greg Quinn Microsoft Scholarly Communications Pablo Fernicola Lee Dirks SavasParastitidas Alex Wade Tony Hey RCSB PDB team Andreas Prilc DimitrisDimitropoulos TB Drugome Team Lei Xie Sarah Kinnings Li Xie http://funsite.sdsc.edu/drugome/TB/ http://biolit.ucsd.edu http//www.pdb.org http://www.codeplex.com/ucsdbiolit

Notes de l'éditeur

  1. 3,996 proteins in TB proteome749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage)ModBase contains homology models for entire TB proteome1,446 ‘high quality’ homology models were added to the data setStructural coverage increased to 43.8% Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models).There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models).However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained)Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions. The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage, and the three individual scores evalue, z-Dope and GA341. We consider a MPQS of >1.1 as reliable
  2. (nutraceuticals excluded)
  3. Multi-target therapy may be more effective than single-target therapy to treat infectious diseasesMost of the proteins listed are potential novel drug targets for the development of efficient anti-tuberculosis chemotherapeutics.GSMN-TB: Genome Scale Metabolic Reaction Network of M.tb (http://sysbio/sbs.surrey.ac.uk/tb)849 reactions, 739 metabolites, 726 genesCan optimize the model for in vivo growthCarry out multiple gene inhibition and compute the maximal theoretical growth rate (if close to zero, that combination of genes is essential for growth)