SlideShare une entreprise Scribd logo
1  sur  35
Alex D. Wade
Senior Research Program Manager
                External Research
              Microsoft Research
            Microsoft Corporation
• Science @ Microsoft
  – and the role of Scholarly Communication

• Office 2007
  – File Format Overview
  – Bibliography Support
  – UI Extensibility

• A Sampling of Related Projects
Putting computing into science…
   Applying Microsoft products and research technologies to
   advance the scientific research and engineering innovation
   process

Putting science into computing…
   Ensuring that research community requirements are factored into
   future versions of Microsoft software

            • Advancement of Science
              • Global Collaboration
             • Technology Excellence
                • Interoperability
• Science + computation are not the entire equation
  • Authoring, Analysis, Publishing, Discoverability, and Data
    Storage/Preservation are key components to scientists’
    everyday work…and Microsoft’s core businesses
• The scholarly community has made it clear to us:
  • Microsoft must improve its offerings throughout the
    scholarly communication lifecycle


• Our approach: Conduct prototyping projects and
  proofs-of-concept to evolve Microsoft’s scholarly
  communication offerings
•   Data Acquisition and Modeling
      –   Data capture from source, cleaning, storage, etc.
      –   SQL Server, SQL Integration Services, Windows Workflow Foundation
 •   Support Collaboration
      –   Allow researchers to work together, share context, facilitate interactions
      –   SharePoint Server, One Note 2007 (shared)
 •   Data Analysis, Modeling, and Visualization
      –   Mining techniques (OLAP, cubes) and visual analytics
      –   SQL Analysis Services, BI, Excel, Optima, SILK (MSR-A)
 •   Disseminate and Share Research Outputs
      –   Publish, Present, Blog, Review and Rate
      –   Word, PowerPoint
 •   Archiving
      –   Published literature, reference data, curated data, etc.
      –   SQL Server

Microsoft is the only company that can offer end-to-end support
                                                                                       5
•   Optimize for data-driven research & science
     –   To both data (scientific) and to information (scholarly publications)
     –   Reproducible research + computational science
     –   Properly document / annotate scholarly output
•   Interoperability is paramount
     –   Actively lobby and drive for consensus around technical standards and standardized protocols proactively
         adopted by the community; enable broad community engagement
           •   Customers have told Microsoft that the interoperability (and intellectual property) are OUR responsibility

•   Data preservation (and provenance) should be baseline
     –   Documentation of the data’s provenance
     –   Reliable and secure long-term storage – at a massive scale
     –   Preservation needs to be like “accessibility” features – i.e., assumed as required
•   Social networking & semantic knowledge discovery
     –   Harnessing collective intelligence must be a consideration – since accessing research is a core step in the
         life-cycle. Enable knowledge discovery
     –   Optimize for Web 2.0 scenarios and allow end-users/experts to find things easier
•   Metadata conventions / taxonomies / ontologies
     –   This is a crucial strength for libraries – and a critical component in enabling Web 2.0
• New file format
  – New file extension (DOCX)
  – All content expressed in XML (Office Open XML)
  – Contained in a zip file (OPC)


• ECMA specification – 376 & ISO Standard
  – OpenXML
  – Open Packaging Conventions
• Easy to access the different parts of document
   – XML file
   – Images
   – Annotations

• Simpler to transform Word’s XML into other XML formats
  or extract relevant data

• Ability to build .docx files programmatically or through
  transformations

• Ability to extend Word UI (and content) to support
  additional or custom data
• Compatibility pack
  – Open and save to docx from older Word versions

• Add-in to export to PDF or XPS

• ODF Converter
  – Open Source project on SourceForge
  – Provides two-way conversion between ODF and
    OpenXML (WordprocessingML, SpreadsheetML, and
    PresentationML)
  – ‘Save As ODF’ to be included in Office 2007 SP2
• Manual Entry of Source Metadata
• Sources saved as Bibliography XML
• Sources.XML contains all sources
• Sources can be imported into new documents
  for easy reuse
• Sources.XML can be shared between users
• Documentation Styles are XSLTs
• Citations and Bibliographies can be inserted
  inline with a single click




• Automatically Formatted according to active
  Documentation Style
• Ribbon Control
• Research Pane
• Smart Tags
• Tools for Authors
  – Search Commands in Office
  – Ribbon for Researchers
• Semantic Information
  – Ontology-based markup of scholarly papers
  – Authoring of chemical drawings + semantic information
  – NLM DTD (Pablo Fernicola)
• Data Preservation & Access
  – File format preservation + interoperability
  – Scientific datasets for research reproducibility
  – Publisher submission workflow for dataset archiving
Search Commands in Office
       Search Commands in Office
              Office Labs
               Office Labs




Goals
•   Office 2007 Add-in that aids in finding commands, options, wizards and
    galleries in Word, Excel and PowerPoint
•   Includes Guided Help, which acts as a tour guide for specific tasks
Project Status
•   Available now via http://www.officelabs.com/projects/searchcommands/
Ribbon for Researchers
Ribbon for Researchers
       Concept
        Concept
Search against the Live Search
                                      Search against the Live Search
                                     Academic service straight
                                      Academic service straight
                                     from within Word
                                      from within Word

                                     One-click insert to the
                                      One-click insert to the
                                     bibliography
                                      bibliography
Integration with various services
 Integration with various services
Semantic Markup in Word 2007
     Semantic Markup in Word 2007
          with UC San Diego
          with UC San Diego


Goals
•   Semantic markup using domain-specific ontologies and controlled vocabularies
•   Facilitate/automate referencing to PDB (and other resources) from manuscript
•   A domain-specific ontology is downloaded and made available from within
    Microsoft Word 2007
•   Authors can record their intention, the meaning of the terms they use based on
    their community’s agreed vocabulary
Project Status
•   Phase 1 complete
•   Beta testing with PLoS later this year
Domain-specific ontology                               Annotations travel with the
                                                       document

                                                       Can be used to improve
                                                       domain-specific discovery of
                                                       information, cross-linking,
                                                       etc.
                           Support for annotations
                           straight from within Word
Chemistry Drawing for Office
   Chemistry Drawing for Office
    Preliminary investigation
     Preliminary investigation



Goals
• Support students/researchers in simple chemistry structure
  authoring/editing
• Storage and transportability of semantic chemical data not just images via
  Chemistry Markup Language (CML)
• Enable automatic extraction/harvesting of chemical data
Project Status
• Early investigation stage
• Will be encouraging on-going publisher feedback
PLANETS
               PLANETS
      Long-term Preservation of
       Long-term Preservation of
            Digital Objects
            Digital Objects


Organization
•   EU Commission Project, €14M for 4 years
•   Consortium of 5 national libraries, 4 national archives, 4 universities and 4
    industry partners
Goals
•   Tools and methods for sustainable long-term preservation of digital objects
•   Preservation of Office Documents based on OpenXML
Project Status
•   OpenXML conversion tools available now:
      – http://research.microsoft.com/research/rpp/projects/MSConversionTools/OpenXMLConversionTools.htm
GenePattern for Word 2007
    GenePattern for Word 2007
        with Broad Institute @ MIT
        with Broad Institute @ MIT


Goals
•Integrate data/images from GenePattern workflows into research papers.
•Allow for research reproducibility by combining data with the text
•Highlight OpenXML and Office 2007 technologies and break new research
ground with the integration of data & workflows with research papers
•Testing/linkage to other labs – moving beyond initial installation

Project Status
•Currently in final phase of testing
•Will move into production in June 2008
•Code to be published http://www.codeplex.com
Data Archive Project
         Data Archive Project
    with Johns Hopkins University
    with Johns Hopkins University



Goals
•Mechanism for long-term preservation of data sets
•Authoring tool to support creation of relationship resource map
•Use of OAI-ORE resource maps for collection description
•Workflow for text & data linkage between publisher and data archive
Word 2007 OPC format
                     Word 2007 OPC format
                    contains data set(s) as well as
                     contains data set(s) as well as
                    resource map of
                     resource map of
                    relationships.
                     relationships.
author



 Publisher retains article and
  Publisher retains article and
 replaces it with the article
  replaces it with the article
 URL. Forwards data to Data
  URL. Forwards data to Data                    publisher
 Archive
  Archive


archive




                    Archive stores data set(s) and
                     Archive stores data set(s) and
                    returns data set URL(s) to publisher
                     returns data set URL(s) to publisher
                    as part of updated resource map
                     as part of updated resource map
•   Direct publisher/repository submission via Word
•   Research Output Repository Platform
•   Conference Management Tool
•   eJournal Service
•   …


                    Alex D. Wade
              alex.wade@microsoft.com
         http://www.microsoft.com/science/
Compatibility packs for older versions of Word
• http://www.microsoft.com/downloads/details.aspx?FamilyId=941B3470-3A

Add-in for saving to PDF or XPS
• http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E

SDK for OpenXML formats
• http://msdn2.microsoft.com/en-us/library/bb448854.aspx

Developer community forum
• http://openxmldeveloper.org/

Open Source OpenXML/ODF converter (both ways)
• http://sourceforge.net/projects/odf-converter/
Microsoft ventures into open access chemistry
Royal Society of Chemistry
By Richard van Noorden
January 29th, 2007
http://www.rsc.org/chemistryworld/News/2008/January/29010803.asp
Computational chemists have secured funding from computing giant Microsoft to showcase how chemistry can benefit from open access data sharing on the
internet.

The two-year eChemistry pilot project represents 'a major test case' for proposed new protocols for sharing scholarly information over the web, said Lee Dirks,
director of scholarly communications at Microsoft Research. Microsoft's support is also a boost for the small band of chemists keen to promote open access
internet publishing.

The public-private collaboration is one of many Microsoft projects to probe the potential of computing to advance scientific research,
and bring back what they learn to improve the company's product line, Dirks told Chemistry World. 'But chemistry is a discipline we've not
typically worked in,' he said. 'From everything I've heard, it's not as progressive a field as, say, astronomy in use of the web'.

Most chemical information on the web is published in closed journals and databases which guarantee high quality but also require a subscription to view. Pre-
print servers, collaborative documents, open databases, video sites, online lab notebooks and blogs provide other ways of communicating research. Combining
the lot offers the enticing prospect of a vast, free-to-access repository. This could transform the sharing of scientific research if the disparate data
sources were machine-readable, so that a search engine could automatically gather data about a particular molecule from a crystal
structure, a movie, an online lab book, and an archived article, for example.

Radical change
The international standards required for this challenge are being developed by the Open Archives Initiative Object Reuse and Exchange Project (OAI-ORE),
based at Cornell University, Ithaca, US. Their model protocols will be officially launched on 3 March at Johns Hopkins University in Maryland.

The eChemistry project, Dirks explained, was chosen as an exemplar to show that the new standards are actually useful to scientists. Chemists and computer
scientists at Cambridge and Southampton universities in the UK, and Indiana, Cornell, and Penn State in the US, will search and index existing online
databases and print archives; and work out how best to record chemistry data captured in lab experiments. The results will be hosted by the US National
Institutes of Health open access PubChem database and other repositories.
http://chronicle.com/daily/2008/02/1585n.htm
Monday, February 11, 2008

Researchers Develop Online Tools for Science Collaborations
By LILA GUTERMAN

Blogs, wikis, and social-networking sites such as Facebook may get media buzz these days, but for scientists, engineers, and doctors, they are not even on the radar.
The most effective tools of the Internet for such people tend to be efforts more narrowly targeted to their needs, such as software that helps geneticists replicate one
another's experiments. That was the underlying message of many presentations at the annual conference of the Professional/Scholarly Publishing Division of the
Association of American Publishers held here last week.

Philip E. Bourne, a professor of pharmacology at the University of California at San Diego, spoke about the Web site SciVee, where scientists can link
videos to their research papers that appear in open-access biomedical journals (The Chronicle, August 21, 2007). Mr. Bourne, who created the site,
calls the videos pubcasts; they are typically about 10 minutes long and go into more detail than an abstract but less than the full-length article.

The videos are coming in at a trickle, says Mr. Bourne. (He attributes the slow rate to the high quality: the graduate students and postdoctoral
researchers who make the videos have been crafting polished presentations.) But some of the ones already online have been viewed more than
100,000 times. When the pubcasts are uploaded, Mr. Bourne has also witnessed a steep increase in downloads of the linked article.

Jill P. Mesirov described an application that she hopes will ultimately become mainstream for journals that publish computational science. Ms. Mesirov,
director of computational biology and bioinformatics at the Broad Institute of Massachusetts Institute of Technology and Harvard University, has
designed a way to make computational work repeatable by other scientists.

The software, called GenePattern, stores both data and analytical routines. As the researcher works to collect and analyze the data, GenePattern
records the steps the scientist has taken, so that anyone else can follow the steps and check the result or expand on the method using new data. Ms.
Mesirov said that more than 6,000 people from more than 100 countries use the software.

She is now working with Microsoft to link such information to manuscripts that could be published online by peer-reviewed journals, to give
readers access to a researcher's computational methods. "One of the problems with publishing a paper that relies heavily on computational work,"
she said, "is that all of the methods that you would need to reproduce it never appear in the journal. If you're lucky, they're in the supplementary material
[online]. How much better if the journal had a link to the paper which had the data and an instantiation of the method embedded right in that paper.”
How can we be sure we’ll remember our digital past?
Christian Science Monitor
By Chris Gaylord
February 13th 2008
http://www.csmonitor.com/2008/0214/p13s02-stct.html

Fading media, formats
The problem of digital preservation reaches across two standards. There's the media – floppies, CDs, hard drives – and the format of the files
themselves – does it run in DOS, Hypercard, ClarisWorks 2.0?

Microsoft tackles this issue of "legacy" computing by running a kind of corporate museum. The company protects its multiplatform history by
preserving old copies of "every major hardware and software change," says Lee Dirks, director of Scholarly Communications at Microsoft and a task
force member.

"We've got computers stored on campus that go back to the Altair, the first computer [to run Microsoft software]," he says. "In fact, we bought
multiple copies of the Altair just in case."

But maintaining antique computers is a costly way to keep the past alive.

A concept that is gaining momentum, Mr. Dirks says, is emulation, where programmers trick modern computers into thinking the way
their classic cousins did. This lets them run old software without retro machines. Another problem arises when the emulator itself is
written for last generation's operating systems. Do you write an emulator to handle the original emulator?

A more likely approach to long-term preservation is migration, says Berman. This calls for updating the file format every generation –
without changing the contents, one hopes. This method has problems, as well. Some of the original context will be lost in translation,
says Dirks. Also, the scale of the conversation will snowball as the number, size, and back-catalog of the files increases with each
passing generation of technology.
•   ICSTI Annual 2007 – Jun07
•   Nature Asia-Pacific Summit – Jun07
•   CODATA Summer School – Jul07
•   DCC Annual 2007 – Dec07
•   iSchool Conference 2008 – Feb08
•   OAI-ORE Launch – Mar08
•   BioMed Central 2007 Research Awards – Mar08
•   Open Repositories 2008 – Apr08
•   JCDL Annual 2008 – Jun08
• “Global Research Library 2020” with University of Washington
  (Oct07 and Mar08)

• Participating in two application(s) to the final round of the NSF
  “DataNet” solicitation (as an unfunded partner)

• Sponsoring BioMed Central’s 2007 Research Awards (Mar08)

• Aug07 Issue of CT Watch Quarterly (v. 3, no. 3)
“The Coming Revolution in Scholarly Communications & Cyberinfrastructure”
http://www.ctwatch.org/quarterly/articles/2007/08/


• New Scholarly Publishing website at:
–   http://www.microsoft.com/mscorp/tc/scholarly-publishing.mspx

Contenu connexe

Similaire à 394 wade word2007-ssp2008

Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word WadeAlex Wade
 
Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...
Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...
Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...Shawna Reibling
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathonRaul Palma
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integrationRaul Palma
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Hackathon report catalogue-ontology-vocabulary-characteristcs-relevant-to-e...
Hackathon report   catalogue-ontology-vocabulary-characteristcs-relevant-to-e...Hackathon report   catalogue-ontology-vocabulary-characteristcs-relevant-to-e...
Hackathon report catalogue-ontology-vocabulary-characteristcs-relevant-to-e...Amanda Vizedom
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfomsNikesh Narayanan
 
1. Reference management tools.ppt
1. Reference management tools.ppt1. Reference management tools.ppt
1. Reference management tools.pptRanchhodRKhmbhala
 
Reference management tools for academicc
Reference management tools for academiccReference management tools for academicc
Reference management tools for academiccjhosiyosi2
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsMelanie Courtot
 
Who says you can't do records management in SharePoint?
Who says you can't do records management in SharePoint?Who says you can't do records management in SharePoint?
Who says you can't do records management in SharePoint?John F. Holliday
 

Similaire à 394 wade word2007-ssp2008 (20)

Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word Wade
 
Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...
Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...
Presentation to 2014 University of Guelph Accessibility Conference Perspectiv...
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathon
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Hackathon report catalogue-ontology-vocabulary-characteristcs-relevant-to-e...
Hackathon report   catalogue-ontology-vocabulary-characteristcs-relevant-to-e...Hackathon report   catalogue-ontology-vocabulary-characteristcs-relevant-to-e...
Hackathon report catalogue-ontology-vocabulary-characteristcs-relevant-to-e...
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
1. Reference management tools.ppt
1. Reference management tools.ppt1. Reference management tools.ppt
1. Reference management tools.ppt
 
Reference management tools for academicc
Reference management tools for academiccReference management tools for academicc
Reference management tools for academicc
 
B01 markus gylling-epub
B01 markus gylling-epubB01 markus gylling-epub
B01 markus gylling-epub
 
B01 markus gylling-epub
B01 markus gylling-epubB01 markus gylling-epub
B01 markus gylling-epub
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Who says you can't do records management in SharePoint?
Who says you can't do records management in SharePoint?Who says you can't do records management in SharePoint?
Who says you can't do records management in SharePoint?
 

Plus de Society for Scholarly Publishing

04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadowsSociety for Scholarly Publishing
 
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterickSociety for Scholarly Publishing
 

Plus de Society for Scholarly Publishing (20)

10052016 ssp seminar2_newsham
10052016 ssp seminar2_newsham10052016 ssp seminar2_newsham
10052016 ssp seminar2_newsham
 
10052016 ssp seminar2_rivera
10052016 ssp seminar2_rivera10052016 ssp seminar2_rivera
10052016 ssp seminar2_rivera
 
10052016 ssp seminar2_pesanelli
10052016 ssp seminar2_pesanelli10052016 ssp seminar2_pesanelli
10052016 ssp seminar2_pesanelli
 
10052016 ssp seminar2_harley
10052016 ssp seminar2_harley10052016 ssp seminar2_harley
10052016 ssp seminar2_harley
 
10042016 ssp seminar1_session4_myers
10042016 ssp seminar1_session4_myers10042016 ssp seminar1_session4_myers
10042016 ssp seminar1_session4_myers
 
10042016 ssp seminar1_session4_demers
10042016 ssp seminar1_session4_demers10042016 ssp seminar1_session4_demers
10042016 ssp seminar1_session4_demers
 
10042016 ssp seminar1_session4_cochran
10042016 ssp seminar1_session4_cochran10042016 ssp seminar1_session4_cochran
10042016 ssp seminar1_session4_cochran
 
10042016 ssp seminar1_session3_stanley
10042016 ssp seminar1_session3_stanley10042016 ssp seminar1_session3_stanley
10042016 ssp seminar1_session3_stanley
 
10042016 ssp seminar1_session3_ranganathan
10042016 ssp seminar1_session3_ranganathan10042016 ssp seminar1_session3_ranganathan
10042016 ssp seminar1_session3_ranganathan
 
10042016 ssp seminar1_session3_odike
10042016 ssp seminar1_session3_odike10042016 ssp seminar1_session3_odike
10042016 ssp seminar1_session3_odike
 
10042016 ssp seminar1_session3_cochran
10042016 ssp seminar1_session3_cochran10042016 ssp seminar1_session3_cochran
10042016 ssp seminar1_session3_cochran
 
10042016 ssp seminar1_session2_walker
10042016 ssp seminar1_session2_walker10042016 ssp seminar1_session2_walker
10042016 ssp seminar1_session2_walker
 
10042016 ssp seminar1_session2_ivins
10042016 ssp seminar1_session2_ivins10042016 ssp seminar1_session2_ivins
10042016 ssp seminar1_session2_ivins
 
10042016 ssp seminar1_session2_holland
10042016 ssp seminar1_session2_holland10042016 ssp seminar1_session2_holland
10042016 ssp seminar1_session2_holland
 
10042016 ssp seminar1_session1_stanley
10042016 ssp seminar1_session1_stanley10042016 ssp seminar1_session1_stanley
10042016 ssp seminar1_session1_stanley
 
10042016 ssp seminar1_session1_keane
10042016 ssp seminar1_session1_keane10042016 ssp seminar1_session1_keane
10042016 ssp seminar1_session1_keane
 
10042016 ssp seminar1_session1_ivins
10042016 ssp seminar1_session1_ivins10042016 ssp seminar1_session1_ivins
10042016 ssp seminar1_session1_ivins
 
10042016 ssp seminar1_session1_asadilari
10042016 ssp seminar1_session1_asadilari10042016 ssp seminar1_session1_asadilari
10042016 ssp seminar1_session1_asadilari
 
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
 
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
 

394 wade word2007-ssp2008

  • 1. Alex D. Wade Senior Research Program Manager External Research Microsoft Research Microsoft Corporation
  • 2. • Science @ Microsoft – and the role of Scholarly Communication • Office 2007 – File Format Overview – Bibliography Support – UI Extensibility • A Sampling of Related Projects
  • 3. Putting computing into science… Applying Microsoft products and research technologies to advance the scientific research and engineering innovation process Putting science into computing… Ensuring that research community requirements are factored into future versions of Microsoft software • Advancement of Science • Global Collaboration • Technology Excellence • Interoperability
  • 4. • Science + computation are not the entire equation • Authoring, Analysis, Publishing, Discoverability, and Data Storage/Preservation are key components to scientists’ everyday work…and Microsoft’s core businesses • The scholarly community has made it clear to us: • Microsoft must improve its offerings throughout the scholarly communication lifecycle • Our approach: Conduct prototyping projects and proofs-of-concept to evolve Microsoft’s scholarly communication offerings
  • 5. Data Acquisition and Modeling – Data capture from source, cleaning, storage, etc. – SQL Server, SQL Integration Services, Windows Workflow Foundation • Support Collaboration – Allow researchers to work together, share context, facilitate interactions – SharePoint Server, One Note 2007 (shared) • Data Analysis, Modeling, and Visualization – Mining techniques (OLAP, cubes) and visual analytics – SQL Analysis Services, BI, Excel, Optima, SILK (MSR-A) • Disseminate and Share Research Outputs – Publish, Present, Blog, Review and Rate – Word, PowerPoint • Archiving – Published literature, reference data, curated data, etc. – SQL Server Microsoft is the only company that can offer end-to-end support 5
  • 6. Optimize for data-driven research & science – To both data (scientific) and to information (scholarly publications) – Reproducible research + computational science – Properly document / annotate scholarly output • Interoperability is paramount – Actively lobby and drive for consensus around technical standards and standardized protocols proactively adopted by the community; enable broad community engagement • Customers have told Microsoft that the interoperability (and intellectual property) are OUR responsibility • Data preservation (and provenance) should be baseline – Documentation of the data’s provenance – Reliable and secure long-term storage – at a massive scale – Preservation needs to be like “accessibility” features – i.e., assumed as required • Social networking & semantic knowledge discovery – Harnessing collective intelligence must be a consideration – since accessing research is a core step in the life-cycle. Enable knowledge discovery – Optimize for Web 2.0 scenarios and allow end-users/experts to find things easier • Metadata conventions / taxonomies / ontologies – This is a crucial strength for libraries – and a critical component in enabling Web 2.0
  • 7. • New file format – New file extension (DOCX) – All content expressed in XML (Office Open XML) – Contained in a zip file (OPC) • ECMA specification – 376 & ISO Standard – OpenXML – Open Packaging Conventions
  • 8. • Easy to access the different parts of document – XML file – Images – Annotations • Simpler to transform Word’s XML into other XML formats or extract relevant data • Ability to build .docx files programmatically or through transformations • Ability to extend Word UI (and content) to support additional or custom data
  • 9. • Compatibility pack – Open and save to docx from older Word versions • Add-in to export to PDF or XPS • ODF Converter – Open Source project on SourceForge – Provides two-way conversion between ODF and OpenXML (WordprocessingML, SpreadsheetML, and PresentationML) – ‘Save As ODF’ to be included in Office 2007 SP2
  • 10. • Manual Entry of Source Metadata
  • 11. • Sources saved as Bibliography XML • Sources.XML contains all sources • Sources can be imported into new documents for easy reuse • Sources.XML can be shared between users • Documentation Styles are XSLTs
  • 12. • Citations and Bibliographies can be inserted inline with a single click • Automatically Formatted according to active Documentation Style
  • 13. • Ribbon Control • Research Pane • Smart Tags
  • 14.
  • 15. • Tools for Authors – Search Commands in Office – Ribbon for Researchers • Semantic Information – Ontology-based markup of scholarly papers – Authoring of chemical drawings + semantic information – NLM DTD (Pablo Fernicola) • Data Preservation & Access – File format preservation + interoperability – Scientific datasets for research reproducibility – Publisher submission workflow for dataset archiving
  • 16. Search Commands in Office Search Commands in Office Office Labs Office Labs Goals • Office 2007 Add-in that aids in finding commands, options, wizards and galleries in Word, Excel and PowerPoint • Includes Guided Help, which acts as a tour guide for specific tasks Project Status • Available now via http://www.officelabs.com/projects/searchcommands/
  • 17.
  • 18. Ribbon for Researchers Ribbon for Researchers Concept Concept
  • 19. Search against the Live Search Search against the Live Search Academic service straight Academic service straight from within Word from within Word One-click insert to the One-click insert to the bibliography bibliography Integration with various services Integration with various services
  • 20. Semantic Markup in Word 2007 Semantic Markup in Word 2007 with UC San Diego with UC San Diego Goals • Semantic markup using domain-specific ontologies and controlled vocabularies • Facilitate/automate referencing to PDB (and other resources) from manuscript • A domain-specific ontology is downloaded and made available from within Microsoft Word 2007 • Authors can record their intention, the meaning of the terms they use based on their community’s agreed vocabulary Project Status • Phase 1 complete • Beta testing with PLoS later this year
  • 21. Domain-specific ontology Annotations travel with the document Can be used to improve domain-specific discovery of information, cross-linking, etc. Support for annotations straight from within Word
  • 22. Chemistry Drawing for Office Chemistry Drawing for Office Preliminary investigation Preliminary investigation Goals • Support students/researchers in simple chemistry structure authoring/editing • Storage and transportability of semantic chemical data not just images via Chemistry Markup Language (CML) • Enable automatic extraction/harvesting of chemical data Project Status • Early investigation stage • Will be encouraging on-going publisher feedback
  • 23. PLANETS PLANETS Long-term Preservation of Long-term Preservation of Digital Objects Digital Objects Organization • EU Commission Project, €14M for 4 years • Consortium of 5 national libraries, 4 national archives, 4 universities and 4 industry partners Goals • Tools and methods for sustainable long-term preservation of digital objects • Preservation of Office Documents based on OpenXML Project Status • OpenXML conversion tools available now: – http://research.microsoft.com/research/rpp/projects/MSConversionTools/OpenXMLConversionTools.htm
  • 24. GenePattern for Word 2007 GenePattern for Word 2007 with Broad Institute @ MIT with Broad Institute @ MIT Goals •Integrate data/images from GenePattern workflows into research papers. •Allow for research reproducibility by combining data with the text •Highlight OpenXML and Office 2007 technologies and break new research ground with the integration of data & workflows with research papers •Testing/linkage to other labs – moving beyond initial installation Project Status •Currently in final phase of testing •Will move into production in June 2008 •Code to be published http://www.codeplex.com
  • 25.
  • 26. Data Archive Project Data Archive Project with Johns Hopkins University with Johns Hopkins University Goals •Mechanism for long-term preservation of data sets •Authoring tool to support creation of relationship resource map •Use of OAI-ORE resource maps for collection description •Workflow for text & data linkage between publisher and data archive
  • 27. Word 2007 OPC format Word 2007 OPC format contains data set(s) as well as contains data set(s) as well as resource map of resource map of relationships. relationships. author Publisher retains article and Publisher retains article and replaces it with the article replaces it with the article URL. Forwards data to Data URL. Forwards data to Data publisher Archive Archive archive Archive stores data set(s) and Archive stores data set(s) and returns data set URL(s) to publisher returns data set URL(s) to publisher as part of updated resource map as part of updated resource map
  • 28. Direct publisher/repository submission via Word • Research Output Repository Platform • Conference Management Tool • eJournal Service • … Alex D. Wade alex.wade@microsoft.com http://www.microsoft.com/science/
  • 29. Compatibility packs for older versions of Word • http://www.microsoft.com/downloads/details.aspx?FamilyId=941B3470-3A Add-in for saving to PDF or XPS • http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E SDK for OpenXML formats • http://msdn2.microsoft.com/en-us/library/bb448854.aspx Developer community forum • http://openxmldeveloper.org/ Open Source OpenXML/ODF converter (both ways) • http://sourceforge.net/projects/odf-converter/
  • 30.
  • 31. Microsoft ventures into open access chemistry Royal Society of Chemistry By Richard van Noorden January 29th, 2007 http://www.rsc.org/chemistryworld/News/2008/January/29010803.asp Computational chemists have secured funding from computing giant Microsoft to showcase how chemistry can benefit from open access data sharing on the internet. The two-year eChemistry pilot project represents 'a major test case' for proposed new protocols for sharing scholarly information over the web, said Lee Dirks, director of scholarly communications at Microsoft Research. Microsoft's support is also a boost for the small band of chemists keen to promote open access internet publishing. The public-private collaboration is one of many Microsoft projects to probe the potential of computing to advance scientific research, and bring back what they learn to improve the company's product line, Dirks told Chemistry World. 'But chemistry is a discipline we've not typically worked in,' he said. 'From everything I've heard, it's not as progressive a field as, say, astronomy in use of the web'. Most chemical information on the web is published in closed journals and databases which guarantee high quality but also require a subscription to view. Pre- print servers, collaborative documents, open databases, video sites, online lab notebooks and blogs provide other ways of communicating research. Combining the lot offers the enticing prospect of a vast, free-to-access repository. This could transform the sharing of scientific research if the disparate data sources were machine-readable, so that a search engine could automatically gather data about a particular molecule from a crystal structure, a movie, an online lab book, and an archived article, for example. Radical change The international standards required for this challenge are being developed by the Open Archives Initiative Object Reuse and Exchange Project (OAI-ORE), based at Cornell University, Ithaca, US. Their model protocols will be officially launched on 3 March at Johns Hopkins University in Maryland. The eChemistry project, Dirks explained, was chosen as an exemplar to show that the new standards are actually useful to scientists. Chemists and computer scientists at Cambridge and Southampton universities in the UK, and Indiana, Cornell, and Penn State in the US, will search and index existing online databases and print archives; and work out how best to record chemistry data captured in lab experiments. The results will be hosted by the US National Institutes of Health open access PubChem database and other repositories.
  • 32. http://chronicle.com/daily/2008/02/1585n.htm Monday, February 11, 2008 Researchers Develop Online Tools for Science Collaborations By LILA GUTERMAN Blogs, wikis, and social-networking sites such as Facebook may get media buzz these days, but for scientists, engineers, and doctors, they are not even on the radar. The most effective tools of the Internet for such people tend to be efforts more narrowly targeted to their needs, such as software that helps geneticists replicate one another's experiments. That was the underlying message of many presentations at the annual conference of the Professional/Scholarly Publishing Division of the Association of American Publishers held here last week. Philip E. Bourne, a professor of pharmacology at the University of California at San Diego, spoke about the Web site SciVee, where scientists can link videos to their research papers that appear in open-access biomedical journals (The Chronicle, August 21, 2007). Mr. Bourne, who created the site, calls the videos pubcasts; they are typically about 10 minutes long and go into more detail than an abstract but less than the full-length article. The videos are coming in at a trickle, says Mr. Bourne. (He attributes the slow rate to the high quality: the graduate students and postdoctoral researchers who make the videos have been crafting polished presentations.) But some of the ones already online have been viewed more than 100,000 times. When the pubcasts are uploaded, Mr. Bourne has also witnessed a steep increase in downloads of the linked article. Jill P. Mesirov described an application that she hopes will ultimately become mainstream for journals that publish computational science. Ms. Mesirov, director of computational biology and bioinformatics at the Broad Institute of Massachusetts Institute of Technology and Harvard University, has designed a way to make computational work repeatable by other scientists. The software, called GenePattern, stores both data and analytical routines. As the researcher works to collect and analyze the data, GenePattern records the steps the scientist has taken, so that anyone else can follow the steps and check the result or expand on the method using new data. Ms. Mesirov said that more than 6,000 people from more than 100 countries use the software. She is now working with Microsoft to link such information to manuscripts that could be published online by peer-reviewed journals, to give readers access to a researcher's computational methods. "One of the problems with publishing a paper that relies heavily on computational work," she said, "is that all of the methods that you would need to reproduce it never appear in the journal. If you're lucky, they're in the supplementary material [online]. How much better if the journal had a link to the paper which had the data and an instantiation of the method embedded right in that paper.”
  • 33. How can we be sure we’ll remember our digital past? Christian Science Monitor By Chris Gaylord February 13th 2008 http://www.csmonitor.com/2008/0214/p13s02-stct.html Fading media, formats The problem of digital preservation reaches across two standards. There's the media – floppies, CDs, hard drives – and the format of the files themselves – does it run in DOS, Hypercard, ClarisWorks 2.0? Microsoft tackles this issue of "legacy" computing by running a kind of corporate museum. The company protects its multiplatform history by preserving old copies of "every major hardware and software change," says Lee Dirks, director of Scholarly Communications at Microsoft and a task force member. "We've got computers stored on campus that go back to the Altair, the first computer [to run Microsoft software]," he says. "In fact, we bought multiple copies of the Altair just in case." But maintaining antique computers is a costly way to keep the past alive. A concept that is gaining momentum, Mr. Dirks says, is emulation, where programmers trick modern computers into thinking the way their classic cousins did. This lets them run old software without retro machines. Another problem arises when the emulator itself is written for last generation's operating systems. Do you write an emulator to handle the original emulator? A more likely approach to long-term preservation is migration, says Berman. This calls for updating the file format every generation – without changing the contents, one hopes. This method has problems, as well. Some of the original context will be lost in translation, says Dirks. Also, the scale of the conversation will snowball as the number, size, and back-catalog of the files increases with each passing generation of technology.
  • 34. ICSTI Annual 2007 – Jun07 • Nature Asia-Pacific Summit – Jun07 • CODATA Summer School – Jul07 • DCC Annual 2007 – Dec07 • iSchool Conference 2008 – Feb08 • OAI-ORE Launch – Mar08 • BioMed Central 2007 Research Awards – Mar08 • Open Repositories 2008 – Apr08 • JCDL Annual 2008 – Jun08
  • 35. • “Global Research Library 2020” with University of Washington (Oct07 and Mar08) • Participating in two application(s) to the final round of the NSF “DataNet” solicitation (as an unfunded partner) • Sponsoring BioMed Central’s 2007 Research Awards (Mar08) • Aug07 Issue of CT Watch Quarterly (v. 3, no. 3) “The Coming Revolution in Scholarly Communications & Cyberinfrastructure” http://www.ctwatch.org/quarterly/articles/2007/08/ • New Scholarly Publishing website at: – http://www.microsoft.com/mscorp/tc/scholarly-publishing.mspx