SlideShare une entreprise Scribd logo
1  sur  47
Enriched research documents at the cutting edge:
When research papers no longer make sense on paper


                     Rudy Potenzone
                      SciencePoint Solutions



                           Lee Dirks
                Education & Scholarly Communication
                  Microsoft Research | Connections


     Presented at the American Chemical Society National Meeting
                      Denver CO, August 30, 2011
     at the Skolnick Award Symposium in Honor of Sandy Lawson
Agenda

•   Part 1 – The Scientific Paper
•   Part 2 – Emergence of the ePaper
•   Part 3 – Of Workflows and Add-ins
•   Part 4 – Impact of the ePaper
•   Part 5 – A Glimpse to the Future
Agenda

•   Part 1 – The Scientific Paper
•   Part 2 – Emergence of the ePaper
•   Part 3 – Of Workflows and Addins
•   Part 4 – Impact of the ePaper
•   Part 5 – A Glimpse to the Future




                                   3
A Brief History of Enriched Scientific Papers

• Research papers have long enjoyed the ability
  to exist on paper with enriched content
• Embed figures and associated electronic
  items
  – chemical structures that included full bonding
    and structural information
  – Crystallographic databases
  – Spectral databases
  – Biological sequence and Pathway databases
  – Supplemental material repositories
Issues with External Repositories

• Often not complete
• Poorly audited with some notable
  exceptions
• References between the paper and the
  files are often lost or incorrect
• There is a real loss of context due to the
  separation of all the information
• Reproducibility is not certain!
Agenda

•   Part 1 – The Scientific Paper
•   Part 2 – Emergence of the ePaper
•   Part 3 – Of Workflows and Add-ins
•   Part 4 – Impact of the ePaper
•   Part 5 – A Glimpse to the Future
My Bio – a Content Perspective
• The NIH/EPA Chemical Information System
  – SANSS, MSSS, FRSS, etc.
• Chemical Abstracts Service
  – CA, Registry, CASREACT, CHEMCATS, SciFinder
• MDL Information Systems/Elsevier
  – ACD, various synthesis, Beilstein
• LION bioscience, Ingenuity Systems
  – SRS and Ingenuity Pathway Analysis (IPA)
• CambridgeSoft
  – ACX, etc.
Why Are We
         NOT
Focusing On Authoring
        Tools?
On the Verge of a Major Revolution

• Technology that enables authors to create
  elaborate versions of results of research
• Capturing the full context of research in
  progress:
  –   The formal scientific report
  –   The very METHODS used
  –   Full data repository
  –   Complete workflows
• With the resulting documentation offering
  information for completely reproducible
  results
Envisioning a New Era of Research
                Reporting
                Reproducible
                 Research




                                Collaboration
Interactive
   Data




                                   Dynamic
                                  Documents



                         Reputation
                         & Influence
Benefits of a Scientific ePaper

• Helping to improve the quality of science
• Facilitating the intellectual transfer of the core
  discoveries
• Fully documenting the provenance of the research
• Preserving the knowledge with complete context
• Services easily accessible on top of the data
   –   a new value-added layer
   –   visualization and analysis
   –   discovery through simulation and modeling
   –   etc.
• Accessible Reproducible Research!!
Reproducible Research

                                            Scientific publications have at
                                            least two goals:
                                            1. to announce a result and
                                            2. to convince readers that the
                                               result is correct.
                                            3. Preservation of knowledge




Jill P. Mesirov. Accessible Reproducible Research. Science Vol. 327 (22) Jan 2010
(from http://www.sciencemag.org/cgi/content/full/327/5964/415/DC1)
Rich
              Original
              Content




                Fully
                Fully
           Reproducible
Workflow   Reproducible     Content
 Process
             Content
              Content       Sharing
Embedded      Driving
           Driving Better   Services
               Better
              Science
              Science



              Full Data
              Content
             Embedded
Agenda

•   Part 1 – The Scientific Paper
•   Part 2 – Emergence of the ePaper
•   Part 3 – Of Workflows and Add-ins
•   Part 4 – Impact of the ePaper
•   Part 5 – A Glimpse to the Future
Redefining the Document
• Microsoft introduced their open document
  format – OpenXML – in Office 2007
Project "Chem4Word"– Chemical Drawing in Microsoft Word
Semantic chemistry for students and publishers
                                                              Author/edit 1D and 2D chemistry.
  Intent: Recognizes                                          Change chemical layout styles.
  chemical dictionary                                                                               Relationships:
  and ontology terms                                                                                Navigate and link
                                                                                                    referenced chemistry
 Data: Semantics
 stored in Chemistry
 Markup Language
 (CML)
   <?xml version="1.0" ?>
   <cml version="3" convention="org-synth-report"
   xmlns="http://www.xml-cml.org/schema">
    <molecule id="m1">
     <atomArray>
      <atom id="a1" elementType="C" x2="-
   2.9149999618530273" y2="0.7699999809265137" />
      <atom id="a2" elementType="C" x2="-                                                             http://www.nytimes.com/2010/04/08/techn
   1.5813208400249916" y2="1.5399999809265137" />
      <atom id="a3" elementType="O" x2="-                                                                ology/personaltech/08askk.html?_r=1
   0.24764171819695613" y2="0.7699999809265134" />
      <atom id="a4" elementType="O" x2="-
   1.5813208400249912" y2="3.0799999809265137" />
      <atom id="a5" elementType="H" x2="-4.248679083681063"
   y2="1.5399999809265137" />
      <atom id="a6" elementType="H" x2="-2.914999961853028"
   y2="-0.7700000190734864" />                                    Intelligence: Verifies validity
      <atom id="a7" elementType="H" x2="-4.248679083681063"
   y2="-1.907348645691087E-8" />
      <atom id="a8" elementType="H" x2="1.0860374036310796"
   y2="1.5399999809265132" />
                                                                  of authored chemistry
     </atomArray>
     <bondArray>
      <bond atomRefs2="a1 a2" order="1" />
      <bond atomRefs2="a2 a3" order="1" />
      <bond atomRefs2="a2 a4" order="2" />
      <bond atomRefs2="a1 a5" order="1" />
      <bond atomRefs2="a1 a6" order="1" />
      <bond atomRefs2="a1 a7" order="1" />
      <bond atomRefs2="a3 a8" order="1" />
     </bondArray>
    </molecule>
   </cml>



                                                                                     V1.0 now available (binary and open source)
                                                                                http://research.microsoft.com/chem4word/
GenePattern Reproducible Research
                   Add-in                         Services: Connects to
                                                  GenePattern database



 Relationships: Inline graphics
 are synchronized to dataset




                                                     Data: Control and
                                                     execute query pipelines
Data: Resulting data (and                            into GenePattern
provenance) stored within
Word document

                                                 Source code and binary:
                                  http://GenepatternWordAddin.codeplex.com
Research Information Centre (RIC) Project
 Virtual Research Environment (VRE) Toolkit for SharePoint



                                     Collaborative environment for
                                     research groups




    Personal site for each
    researcher and project
    site for each project
                                                  Document management,
                                                  federated search, social
                                                  networking, real-time
                                                  communication, blogs, wikis

Project Overview:
                                                   Version 1.1 (Open Source under Ms-PL):
http://research.microsoft.com/ric/                            http://ric.codeplex.com/
http://research.microsoft.com/vre/
oreChem – The Chemical Semantic Web
 • Lee Giles      • Geoffrey Fox                     • Carl Lagoze       • Jeremy Frey          • Peter Murray-Rust
 • Karl Mueller                                                          • Simon Coles          • Jim Downing
 • Prasenjit Mitra                                                                              • Nico Adams
                                                                                           Demonstrating:
                                                                                           • Large collaboration project
                                                                                             focusing on interoperability
                                                                                           • At-source capture of
                                                                                             chemistry data
Semantic storage
                                                                                           • Chemical structure search
                                                                                           • Compound object authoring
                                                                                           • Retrospective harvesting of
                                                                                             chemistry data
                                                                                           • Reuse through common ORE
                                                                                             data model
                                                                                           • Semantic authoring
                                                                                           • Virtualized triple storage

          experiments
                                  documents                                  scientists
   text            measurements                           molecules
                                              data
                                                                      data                   molecules

                                                          Compound                        Mash-up (re-use)
                                                          document                        of data
                                                          authoring
Enabling the Chemical Semantic Web

“RSC Publishing and Southampton University drive
the chemical semantic web…”
Recent developments of interest

Elsevier's Article of the Future Competition
   Grand Challenge & Article of the Future contest -- ongoing collaboration between
   Elsevier and the scientific community to redefine how a scientific article is
   presented online.
PLoS Currents: Influenza
   In conjunction with NIH & Google Knol – a rapid research note service, enable this
   exchange by providing an open-access online resource for immediate, open
   communication and discussion of new scientific data, analyses, and ideas in the field
   of influenza. All content is moderated by an expert group of influenza researchers,
   but in the interest of timeliness, does not undergo in-depth peer review.
Nature Preceedings
   Connects thousands of researchers and provides a platform for sharing new and
   preliminary findings with colleagues on a global scale – via pre-print manuscripts,
   posters and presentations. Claim priority and receive feedback on your findings
   prior to formal publication.
Mendeley (and Papers)
   Called “iTunes” for academic papers; 400,000+ users have signed up and a
   staggering 30+ million scientific papers have been uploaded.
Several Commercial
     Data Sharing + Analysis Services


• Swivel
• IBM’s “Many Eyes”
• Gapminder &
  Google’s Trendalyzer
• Metaweb’s “Freebase”
• CSA’s “Illustrata”
Harvard’s “Dataverse” Project




http://thedata.org

  Via web application software, data citation standards, and statistical methods, the
  Dataverse Network project increases scholarly recognition and distributed control
  for authors, journals, archives, teachers, and others who produce or organize data;
  facilitates data access and analysis for researchers and students; and ensures long-
  term preservation whether or not the data are in the public domain. [From the
  Institute of Quantitative Social Science (IQSS) at Harvard University]
Taverna

• Taverna is an open source and domain-
  independent Workflow Management System
   – A suite of tools used to design and execute scientific
     workflows and aid in silico experimentation.
• Taverna has been created by the myGrid team and
  funded through OMII-UK. The project has
  guaranteed funding until 2014.
• The Taverna Suite is written in Java and includes
  the Taverna Engine (used for enacting workflows)
  that powers both the Taverna Workbench
  (desktop client) and the Taverna Server.
More on Taverna

• Integrated with other myGrid tools
  – social networking and workflow sharing
    environment for scientists
  – curated catalogue of Web services for Life
    Sciences
Provenance
                                                                       Log what, where,
                                                                         when who
                                                                       For data and for
                                                                         publications
To Do




                      Ingredient List                                 Dissolve 4-      Add K2CO3             Heat at reflux                  Cool and add                                  Heat at              Cool and add                                 Extract with                   Combine organics,
 List




                                                                      flourinated      powder                for 1.5 hours                   Br11OCB                                       reflux until         water (30ml)                                 DCM                            dry over MgSO4 &
                      Fluorinated biphenyl        0.9 g
                      Br11OCB                     1.59 g              biphenyl in                                                                                                          completion                                                        (3x40ml)                       filter
                      Potassium Carbonate         2.07 g              butanone
                      Butanone                    40 ml
 Plan




                                                            Add                                                                              Cool
                                                                                      Add                 Reflux                                                                                                                                     Liquid-
                                                                                                                                                                           Add          Reflux               Cool                 Add                                                    Dry                        Filter
                                                                                                                                                                                                                                                      liquid
                                                                                                                                                                                                                                                    extraction                                                                     b
                                                                                                                                                                                                                                                                                                                                  Ev




                                                                 0.9031    grammes
                                                                                                                                                                                                                                                                               excess                 g
                                                                                                                                        Inorganics dissolve 2                                                                           3 of 40              ml
                                                                                                                                         layers. Added brine
                                                                                                                                               ~20ml.             text

                                                                                                                                                                                                     image
                                                                           Weigh
                                                                                                                                                                                                                                                                                    Measure
                                                                                                                                                                                                                                              Measure
                                                     Sample of 4-
          Butanone dried via silica column and
Process




            measured into 100ml RB flask.             flourinated
Record




           Used 1ml extra solvent to wash out           biphenyl                                                                                                Annotate
                      container.                                                                                                                                                                                                                                        DCM                       MgSO4
                                                 Annotate

                                                                       1       1             2     2                  1           3                      1         4          3     5            2     6            2       7            4         8                            9                           10               11
                                                             Add                                                                             Cool
                                                                                       Add                Reflux                                                                                                                  Add
                                                                                                                                                                         Add            Reflux               Cool                                    Liquid-                             Dry                       Filter
                                       text                                                                                                                      Sample of                                                                            liquid                                                     (Buchner)
                                                      Butanone                                                            Annotate
                                                                                                                                                                                                                                                    extraction                                                                     b
                                                                                                 Sample of                                                       Br11OCB
                                                                                                                                                                                                                            Water                                       Annotate                          Annotate                Ev
                                                                                                  K2CO3
                                                 Measure                                          Powder
                                                                                                                                                                                                                        Measure

                                                                                                                                                                                                                                    27
                                                                                     Weigh                                                                                  Weigh
                                                                                                                                 text

                                                                                                        Started reflux at 13.30. (Had to
                                                                                                       change heater stirrer) Only reflux
                                                       40                                                                                                                                                                                                               text            Washed MgSO4 with    text
                                                                 ml                                       for 45min, next step 14:15.                                                                                                             Organics are yellow
                                                                                                                                                                                                                                                                                           DCM ~ 50ml
myGrid Open Suite of Tools
 Workflow Repository    Workflow GUI Workbench       Client User Interfaces
                         and 3rd party plug-ins




                                                         Web Portals
Service Catalogue



                       Provenance      Workflow      Programming and
                          Store         Server             APIs

Activity and Service
 Plug-in Manager
                             Open
                          Provenance
                             Model




                        Secure Service Access, and
                            Programming APIs
Recycling, Reuse, Repurposing

                              • Share
                              • Search
                              • Re-use
                              • Re-purpose
                              • Execute
                              • Communicate
                              • Record


            http://www.myexperiment.org/
Project Trident
Built on Windows Workflow Foundation


                                   Author, Execute and Monitor Workflows




                                                                               Compose and
                                                                               modify workflows
                                                                               via drag & drop
                                                                               canvas


 View data products, performance
 metrics, and provenance data
                                                       Version 1.2 (Open Source under Apache 2.0 License):
                                                               http://tridentworkflow.codeplex.com/
KNIME

• KNIME (Konstanz Information Miner)
• A user-friendly and comprehensive Open-
  Source platform for:
  – Data integration
  – Processing
  – Analysis
  – Exploration
• Growing vendor adoption
  – PerkinElmer, Shrodinger, Tripos, CCG,
    ChemAxon, etc.
Accelrys Pipeline Pilot
      Chemistry
Accelrys Pipeline Pilot
        ADME
Accelrys Pipeline Pilot
       Biology
Accelrys Pipeline Pilot
      Genomics
Envisioning a New Era of Research
                    Reporting
                               Reproducible
                                Research
Imagine…
• Live research reports
    – multiple end-user ‘views’
    – dynamically tailor presentations
   Interactive                                    Collaboration
• An Data
     authoring environment that
  absorbs and encapsulates
   – research workflows
   – outputs from the lab experiments
• A report that can be dropped into
  an electronic lab workbench and                        Dynamic
  reconstitute an entire experiment                     Documents
• Dynamic mash up data and
  workflows across experiments
• Apply new analyses and                  Reputation
  visualizations and perform new in       & Influence
  silico experiments
Agenda

•   Part 1 – The Scientific Paper
•   Part 2 – Emergence of the ePaper
•   Part 3 – Of Workflows and Add-ins
•   Part 4 – Impact of the ePaper
•   Part 5 – A Glimpse to the Future
Impact of These Innovations

• On Science
• On the Business of Science
• On the Scientific Community

• And Other Emotional Factors . . .
Overall Impacts

 Authors will be somewhat inconvenienced to
  learn new things . . . But as readers and
  consumers it will clearly be beneficial!
 Across Industry and Academia it will be
  positive advance
 The vendors will be skeptical and reluctant to
  change – but will move with the spending
  community!
On the Scientific Community

• This will provide a significantly more
  capable platform for science
  – Extending collaboration
  – Easing validation of research
  – Offering transfer of knowledge and ease of
    extension of research projects
• But is DOES further erode the status quo
  system of rewards and tenure!
And Other Emotional Factors
      Is There An Elephant In This Room??



• The Publishers??

• CAS?? Other A&I
  companies??

• Well what about
  Electronic Lab
  Notebooks??
On the Business of Science

• Publishers will need to continue to evolve
  to find a role as “cool provider” of these
  tools and become a “hot” distribution
  center
• A&I companies will need to redefine their
  role
• Software vendors have a real opportunity,
  if they can adapt . . .
The Value of the A & I Layers
Abstracting and Indexing in the Future




                                         Going Forward

                     Today
                                         • Indexing with Context “Built-
                     • Indexing is Key     In”
                     • Precision and     • Will Abstracting or more
 The Old Days          Recall              correctly ‘Content Monitoring’
                     • “Beats” Google      become the value add?
 • Abstracting was                       • Or be an reliable data
                       every time
   Key                                     aggregator?
 • True Assessment
   of Content
Agenda

•   Part 1 – The Scientific Paper
•   Part 2 – Emergence of the ePaper
•   Part 3 – Of Workflows and Add-ins
•   Part 4 – Impact of the ePaper
•   Part 5 – A Glimpse to the Future
Rich Content Sources    Direct Search Tools

                 Challenge
                    Or
                Opportunity

Reproducible Science   Complete Provenance
The Opportunity Before Us
   • Faster Development in an Increasingly
     Complex World
      –   Improving reproducibility of scientific results
      –   Data Sharing and collaboration services
      –   Reliable maintenance of provenance
      –   Faster availability and efficient query tools
      –   Secure and/or controlled access to data
      –   Finding related data and research partners
      –   Assurance that data will be preserved
   • A Brave New World for Scientific Discovery
     and Research
      – Cross-domain partnerships
      – Enhanced broad availability of data and prior
        research
   • Improved Knowledge Transfer
      – Both upstream and downstream
      – Realizing the promise of translational medicine
Thank You!

              Rudy Potenzone
            SciencePoint Solutions

           rudy@sciencepoint.com



                  Lee Dirks
     Education & Scholarly Communication
       Microsoft Research | Connections

ldirks@microsoft.com or scholar@microsoft.com
URL – http://www.microsoft.com/scholarlycomm/
Facebook: Scholarly Communication at Microsoft

Contenu connexe

Tendances

Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Functional Genomics Data Society
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Syed Ahmad Chan Bukhari, PhD
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_finalKristi Holmes
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Anubhav Jain
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 

Tendances (20)

Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Updated CV of SPSingh2017
Updated CV of SPSingh2017Updated CV of SPSingh2017
Updated CV of SPSingh2017
 
Open science 2014
Open science 2014Open science 2014
Open science 2014
 
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_final
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 

Similaire à Acs denver dirks potenzone 30 aug2011

Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word WadeAlex Wade
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureBjörn Brembs
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 
Where are we going and how are we going to get there?
Where are we going and how are we going to get there?Where are we going and how are we going to get there?
Where are we going and how are we going to get there?David De Roure
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentDavid De Roure
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 

Similaire à Acs denver dirks potenzone 30 aug2011 (20)

Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word Wade
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Where are we going and how are we going to get there?
Where are we going and how are we going to get there?Where are we going and how are we going to get there?
Where are we going and how are we going to get there?
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Acs denver dirks potenzone 30 aug2011

  • 1. Enriched research documents at the cutting edge: When research papers no longer make sense on paper Rudy Potenzone SciencePoint Solutions Lee Dirks Education & Scholarly Communication Microsoft Research | Connections Presented at the American Chemical Society National Meeting Denver CO, August 30, 2011 at the Skolnick Award Symposium in Honor of Sandy Lawson
  • 2. Agenda • Part 1 – The Scientific Paper • Part 2 – Emergence of the ePaper • Part 3 – Of Workflows and Add-ins • Part 4 – Impact of the ePaper • Part 5 – A Glimpse to the Future
  • 3. Agenda • Part 1 – The Scientific Paper • Part 2 – Emergence of the ePaper • Part 3 – Of Workflows and Addins • Part 4 – Impact of the ePaper • Part 5 – A Glimpse to the Future 3
  • 4. A Brief History of Enriched Scientific Papers • Research papers have long enjoyed the ability to exist on paper with enriched content • Embed figures and associated electronic items – chemical structures that included full bonding and structural information – Crystallographic databases – Spectral databases – Biological sequence and Pathway databases – Supplemental material repositories
  • 5. Issues with External Repositories • Often not complete • Poorly audited with some notable exceptions • References between the paper and the files are often lost or incorrect • There is a real loss of context due to the separation of all the information • Reproducibility is not certain!
  • 6. Agenda • Part 1 – The Scientific Paper • Part 2 – Emergence of the ePaper • Part 3 – Of Workflows and Add-ins • Part 4 – Impact of the ePaper • Part 5 – A Glimpse to the Future
  • 7. My Bio – a Content Perspective • The NIH/EPA Chemical Information System – SANSS, MSSS, FRSS, etc. • Chemical Abstracts Service – CA, Registry, CASREACT, CHEMCATS, SciFinder • MDL Information Systems/Elsevier – ACD, various synthesis, Beilstein • LION bioscience, Ingenuity Systems – SRS and Ingenuity Pathway Analysis (IPA) • CambridgeSoft – ACX, etc.
  • 8. Why Are We NOT Focusing On Authoring Tools?
  • 9. On the Verge of a Major Revolution • Technology that enables authors to create elaborate versions of results of research • Capturing the full context of research in progress: – The formal scientific report – The very METHODS used – Full data repository – Complete workflows • With the resulting documentation offering information for completely reproducible results
  • 10. Envisioning a New Era of Research Reporting Reproducible Research Collaboration Interactive Data Dynamic Documents Reputation & Influence
  • 11. Benefits of a Scientific ePaper • Helping to improve the quality of science • Facilitating the intellectual transfer of the core discoveries • Fully documenting the provenance of the research • Preserving the knowledge with complete context • Services easily accessible on top of the data – a new value-added layer – visualization and analysis – discovery through simulation and modeling – etc. • Accessible Reproducible Research!!
  • 12. Reproducible Research Scientific publications have at least two goals: 1. to announce a result and 2. to convince readers that the result is correct. 3. Preservation of knowledge Jill P. Mesirov. Accessible Reproducible Research. Science Vol. 327 (22) Jan 2010 (from http://www.sciencemag.org/cgi/content/full/327/5964/415/DC1)
  • 13. Rich Original Content Fully Fully Reproducible Workflow Reproducible Content Process Content Content Sharing Embedded Driving Driving Better Services Better Science Science Full Data Content Embedded
  • 14. Agenda • Part 1 – The Scientific Paper • Part 2 – Emergence of the ePaper • Part 3 – Of Workflows and Add-ins • Part 4 – Impact of the ePaper • Part 5 – A Glimpse to the Future
  • 15. Redefining the Document • Microsoft introduced their open document format – OpenXML – in Office 2007
  • 16. Project "Chem4Word"– Chemical Drawing in Microsoft Word Semantic chemistry for students and publishers Author/edit 1D and 2D chemistry. Intent: Recognizes Change chemical layout styles. chemical dictionary Relationships: and ontology terms Navigate and link referenced chemistry Data: Semantics stored in Chemistry Markup Language (CML) <?xml version="1.0" ?> <cml version="3" convention="org-synth-report" xmlns="http://www.xml-cml.org/schema"> <molecule id="m1"> <atomArray> <atom id="a1" elementType="C" x2="- 2.9149999618530273" y2="0.7699999809265137" /> <atom id="a2" elementType="C" x2="- http://www.nytimes.com/2010/04/08/techn 1.5813208400249916" y2="1.5399999809265137" /> <atom id="a3" elementType="O" x2="- ology/personaltech/08askk.html?_r=1 0.24764171819695613" y2="0.7699999809265134" /> <atom id="a4" elementType="O" x2="- 1.5813208400249912" y2="3.0799999809265137" /> <atom id="a5" elementType="H" x2="-4.248679083681063" y2="1.5399999809265137" /> <atom id="a6" elementType="H" x2="-2.914999961853028" y2="-0.7700000190734864" /> Intelligence: Verifies validity <atom id="a7" elementType="H" x2="-4.248679083681063" y2="-1.907348645691087E-8" /> <atom id="a8" elementType="H" x2="1.0860374036310796" y2="1.5399999809265132" /> of authored chemistry </atomArray> <bondArray> <bond atomRefs2="a1 a2" order="1" /> <bond atomRefs2="a2 a3" order="1" /> <bond atomRefs2="a2 a4" order="2" /> <bond atomRefs2="a1 a5" order="1" /> <bond atomRefs2="a1 a6" order="1" /> <bond atomRefs2="a1 a7" order="1" /> <bond atomRefs2="a3 a8" order="1" /> </bondArray> </molecule> </cml> V1.0 now available (binary and open source) http://research.microsoft.com/chem4word/
  • 17. GenePattern Reproducible Research Add-in Services: Connects to GenePattern database Relationships: Inline graphics are synchronized to dataset Data: Control and execute query pipelines Data: Resulting data (and into GenePattern provenance) stored within Word document Source code and binary: http://GenepatternWordAddin.codeplex.com
  • 18. Research Information Centre (RIC) Project Virtual Research Environment (VRE) Toolkit for SharePoint Collaborative environment for research groups Personal site for each researcher and project site for each project Document management, federated search, social networking, real-time communication, blogs, wikis Project Overview: Version 1.1 (Open Source under Ms-PL): http://research.microsoft.com/ric/ http://ric.codeplex.com/ http://research.microsoft.com/vre/
  • 19. oreChem – The Chemical Semantic Web • Lee Giles • Geoffrey Fox • Carl Lagoze • Jeremy Frey • Peter Murray-Rust • Karl Mueller • Simon Coles • Jim Downing • Prasenjit Mitra • Nico Adams Demonstrating: • Large collaboration project focusing on interoperability • At-source capture of chemistry data Semantic storage • Chemical structure search • Compound object authoring • Retrospective harvesting of chemistry data • Reuse through common ORE data model • Semantic authoring • Virtualized triple storage experiments documents scientists text measurements molecules data data molecules Compound Mash-up (re-use) document of data authoring
  • 20. Enabling the Chemical Semantic Web “RSC Publishing and Southampton University drive the chemical semantic web…”
  • 21. Recent developments of interest Elsevier's Article of the Future Competition Grand Challenge & Article of the Future contest -- ongoing collaboration between Elsevier and the scientific community to redefine how a scientific article is presented online. PLoS Currents: Influenza In conjunction with NIH & Google Knol – a rapid research note service, enable this exchange by providing an open-access online resource for immediate, open communication and discussion of new scientific data, analyses, and ideas in the field of influenza. All content is moderated by an expert group of influenza researchers, but in the interest of timeliness, does not undergo in-depth peer review. Nature Preceedings Connects thousands of researchers and provides a platform for sharing new and preliminary findings with colleagues on a global scale – via pre-print manuscripts, posters and presentations. Claim priority and receive feedback on your findings prior to formal publication. Mendeley (and Papers) Called “iTunes” for academic papers; 400,000+ users have signed up and a staggering 30+ million scientific papers have been uploaded.
  • 22. Several Commercial Data Sharing + Analysis Services • Swivel • IBM’s “Many Eyes” • Gapminder & Google’s Trendalyzer • Metaweb’s “Freebase” • CSA’s “Illustrata”
  • 23. Harvard’s “Dataverse” Project http://thedata.org Via web application software, data citation standards, and statistical methods, the Dataverse Network project increases scholarly recognition and distributed control for authors, journals, archives, teachers, and others who produce or organize data; facilitates data access and analysis for researchers and students; and ensures long- term preservation whether or not the data are in the public domain. [From the Institute of Quantitative Social Science (IQSS) at Harvard University]
  • 24.
  • 25. Taverna • Taverna is an open source and domain- independent Workflow Management System – A suite of tools used to design and execute scientific workflows and aid in silico experimentation. • Taverna has been created by the myGrid team and funded through OMII-UK. The project has guaranteed funding until 2014. • The Taverna Suite is written in Java and includes the Taverna Engine (used for enacting workflows) that powers both the Taverna Workbench (desktop client) and the Taverna Server.
  • 26. More on Taverna • Integrated with other myGrid tools – social networking and workflow sharing environment for scientists – curated catalogue of Web services for Life Sciences
  • 27. Provenance Log what, where, when who For data and for publications To Do Ingredient List Dissolve 4- Add K2CO3 Heat at reflux Cool and add Heat at Cool and add Extract with Combine organics, List flourinated powder for 1.5 hours Br11OCB reflux until water (30ml) DCM dry over MgSO4 & Fluorinated biphenyl 0.9 g Br11OCB 1.59 g biphenyl in completion (3x40ml) filter Potassium Carbonate 2.07 g butanone Butanone 40 ml Plan Add Cool Add Reflux Liquid- Add Reflux Cool Add Dry Filter liquid extraction b Ev 0.9031 grammes excess g Inorganics dissolve 2 3 of 40 ml layers. Added brine ~20ml. text image Weigh Measure Measure Sample of 4- Butanone dried via silica column and Process measured into 100ml RB flask. flourinated Record Used 1ml extra solvent to wash out biphenyl Annotate container. DCM MgSO4 Annotate 1 1 2 2 1 3 1 4 3 5 2 6 2 7 4 8 9 10 11 Add Cool Add Reflux Add Add Reflux Cool Liquid- Dry Filter text Sample of liquid (Buchner) Butanone Annotate extraction b Sample of Br11OCB Water Annotate Annotate Ev K2CO3 Measure Powder Measure 27 Weigh Weigh text Started reflux at 13.30. (Had to change heater stirrer) Only reflux 40 text Washed MgSO4 with text ml for 45min, next step 14:15. Organics are yellow DCM ~ 50ml
  • 28. myGrid Open Suite of Tools Workflow Repository Workflow GUI Workbench Client User Interfaces and 3rd party plug-ins Web Portals Service Catalogue Provenance Workflow Programming and Store Server APIs Activity and Service Plug-in Manager Open Provenance Model Secure Service Access, and Programming APIs
  • 29. Recycling, Reuse, Repurposing • Share • Search • Re-use • Re-purpose • Execute • Communicate • Record http://www.myexperiment.org/
  • 30. Project Trident Built on Windows Workflow Foundation Author, Execute and Monitor Workflows Compose and modify workflows via drag & drop canvas View data products, performance metrics, and provenance data Version 1.2 (Open Source under Apache 2.0 License): http://tridentworkflow.codeplex.com/
  • 31. KNIME • KNIME (Konstanz Information Miner) • A user-friendly and comprehensive Open- Source platform for: – Data integration – Processing – Analysis – Exploration • Growing vendor adoption – PerkinElmer, Shrodinger, Tripos, CCG, ChemAxon, etc.
  • 36. Envisioning a New Era of Research Reporting Reproducible Research Imagine… • Live research reports – multiple end-user ‘views’ – dynamically tailor presentations Interactive Collaboration • An Data authoring environment that absorbs and encapsulates – research workflows – outputs from the lab experiments • A report that can be dropped into an electronic lab workbench and Dynamic reconstitute an entire experiment Documents • Dynamic mash up data and workflows across experiments • Apply new analyses and Reputation visualizations and perform new in & Influence silico experiments
  • 37. Agenda • Part 1 – The Scientific Paper • Part 2 – Emergence of the ePaper • Part 3 – Of Workflows and Add-ins • Part 4 – Impact of the ePaper • Part 5 – A Glimpse to the Future
  • 38. Impact of These Innovations • On Science • On the Business of Science • On the Scientific Community • And Other Emotional Factors . . .
  • 39. Overall Impacts  Authors will be somewhat inconvenienced to learn new things . . . But as readers and consumers it will clearly be beneficial!  Across Industry and Academia it will be positive advance  The vendors will be skeptical and reluctant to change – but will move with the spending community!
  • 40. On the Scientific Community • This will provide a significantly more capable platform for science – Extending collaboration – Easing validation of research – Offering transfer of knowledge and ease of extension of research projects • But is DOES further erode the status quo system of rewards and tenure!
  • 41. And Other Emotional Factors Is There An Elephant In This Room?? • The Publishers?? • CAS?? Other A&I companies?? • Well what about Electronic Lab Notebooks??
  • 42. On the Business of Science • Publishers will need to continue to evolve to find a role as “cool provider” of these tools and become a “hot” distribution center • A&I companies will need to redefine their role • Software vendors have a real opportunity, if they can adapt . . .
  • 43. The Value of the A & I Layers Abstracting and Indexing in the Future Going Forward Today • Indexing with Context “Built- • Indexing is Key In” • Precision and • Will Abstracting or more The Old Days Recall correctly ‘Content Monitoring’ • “Beats” Google become the value add? • Abstracting was • Or be an reliable data every time Key aggregator? • True Assessment of Content
  • 44. Agenda • Part 1 – The Scientific Paper • Part 2 – Emergence of the ePaper • Part 3 – Of Workflows and Add-ins • Part 4 – Impact of the ePaper • Part 5 – A Glimpse to the Future
  • 45. Rich Content Sources Direct Search Tools Challenge Or Opportunity Reproducible Science Complete Provenance
  • 46. The Opportunity Before Us • Faster Development in an Increasingly Complex World – Improving reproducibility of scientific results – Data Sharing and collaboration services – Reliable maintenance of provenance – Faster availability and efficient query tools – Secure and/or controlled access to data – Finding related data and research partners – Assurance that data will be preserved • A Brave New World for Scientific Discovery and Research – Cross-domain partnerships – Enhanced broad availability of data and prior research • Improved Knowledge Transfer – Both upstream and downstream – Realizing the promise of translational medicine
  • 47. Thank You! Rudy Potenzone SciencePoint Solutions rudy@sciencepoint.com Lee Dirks Education & Scholarly Communication Microsoft Research | Connections ldirks@microsoft.com or scholar@microsoft.com URL – http://www.microsoft.com/scholarlycomm/ Facebook: Scholarly Communication at Microsoft