08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Acs denver dirks potenzone 30 aug2011
1. Enriched research documents at the cutting edge:
When research papers no longer make sense on paper
Rudy Potenzone
SciencePoint Solutions
Lee Dirks
Education & Scholarly Communication
Microsoft Research | Connections
Presented at the American Chemical Society National Meeting
Denver CO, August 30, 2011
at the Skolnick Award Symposium in Honor of Sandy Lawson
2. Agenda
• Part 1 – The Scientific Paper
• Part 2 – Emergence of the ePaper
• Part 3 – Of Workflows and Add-ins
• Part 4 – Impact of the ePaper
• Part 5 – A Glimpse to the Future
3. Agenda
• Part 1 – The Scientific Paper
• Part 2 – Emergence of the ePaper
• Part 3 – Of Workflows and Addins
• Part 4 – Impact of the ePaper
• Part 5 – A Glimpse to the Future
3
4. A Brief History of Enriched Scientific Papers
• Research papers have long enjoyed the ability
to exist on paper with enriched content
• Embed figures and associated electronic
items
– chemical structures that included full bonding
and structural information
– Crystallographic databases
– Spectral databases
– Biological sequence and Pathway databases
– Supplemental material repositories
5. Issues with External Repositories
• Often not complete
• Poorly audited with some notable
exceptions
• References between the paper and the
files are often lost or incorrect
• There is a real loss of context due to the
separation of all the information
• Reproducibility is not certain!
6. Agenda
• Part 1 – The Scientific Paper
• Part 2 – Emergence of the ePaper
• Part 3 – Of Workflows and Add-ins
• Part 4 – Impact of the ePaper
• Part 5 – A Glimpse to the Future
7. My Bio – a Content Perspective
• The NIH/EPA Chemical Information System
– SANSS, MSSS, FRSS, etc.
• Chemical Abstracts Service
– CA, Registry, CASREACT, CHEMCATS, SciFinder
• MDL Information Systems/Elsevier
– ACD, various synthesis, Beilstein
• LION bioscience, Ingenuity Systems
– SRS and Ingenuity Pathway Analysis (IPA)
• CambridgeSoft
– ACX, etc.
9. On the Verge of a Major Revolution
• Technology that enables authors to create
elaborate versions of results of research
• Capturing the full context of research in
progress:
– The formal scientific report
– The very METHODS used
– Full data repository
– Complete workflows
• With the resulting documentation offering
information for completely reproducible
results
10. Envisioning a New Era of Research
Reporting
Reproducible
Research
Collaboration
Interactive
Data
Dynamic
Documents
Reputation
& Influence
11. Benefits of a Scientific ePaper
• Helping to improve the quality of science
• Facilitating the intellectual transfer of the core
discoveries
• Fully documenting the provenance of the research
• Preserving the knowledge with complete context
• Services easily accessible on top of the data
– a new value-added layer
– visualization and analysis
– discovery through simulation and modeling
– etc.
• Accessible Reproducible Research!!
12. Reproducible Research
Scientific publications have at
least two goals:
1. to announce a result and
2. to convince readers that the
result is correct.
3. Preservation of knowledge
Jill P. Mesirov. Accessible Reproducible Research. Science Vol. 327 (22) Jan 2010
(from http://www.sciencemag.org/cgi/content/full/327/5964/415/DC1)
13. Rich
Original
Content
Fully
Fully
Reproducible
Workflow Reproducible Content
Process
Content
Content Sharing
Embedded Driving
Driving Better Services
Better
Science
Science
Full Data
Content
Embedded
14. Agenda
• Part 1 – The Scientific Paper
• Part 2 – Emergence of the ePaper
• Part 3 – Of Workflows and Add-ins
• Part 4 – Impact of the ePaper
• Part 5 – A Glimpse to the Future
15. Redefining the Document
• Microsoft introduced their open document
format – OpenXML – in Office 2007
16. Project "Chem4Word"– Chemical Drawing in Microsoft Word
Semantic chemistry for students and publishers
Author/edit 1D and 2D chemistry.
Intent: Recognizes Change chemical layout styles.
chemical dictionary Relationships:
and ontology terms Navigate and link
referenced chemistry
Data: Semantics
stored in Chemistry
Markup Language
(CML)
<?xml version="1.0" ?>
<cml version="3" convention="org-synth-report"
xmlns="http://www.xml-cml.org/schema">
<molecule id="m1">
<atomArray>
<atom id="a1" elementType="C" x2="-
2.9149999618530273" y2="0.7699999809265137" />
<atom id="a2" elementType="C" x2="- http://www.nytimes.com/2010/04/08/techn
1.5813208400249916" y2="1.5399999809265137" />
<atom id="a3" elementType="O" x2="- ology/personaltech/08askk.html?_r=1
0.24764171819695613" y2="0.7699999809265134" />
<atom id="a4" elementType="O" x2="-
1.5813208400249912" y2="3.0799999809265137" />
<atom id="a5" elementType="H" x2="-4.248679083681063"
y2="1.5399999809265137" />
<atom id="a6" elementType="H" x2="-2.914999961853028"
y2="-0.7700000190734864" /> Intelligence: Verifies validity
<atom id="a7" elementType="H" x2="-4.248679083681063"
y2="-1.907348645691087E-8" />
<atom id="a8" elementType="H" x2="1.0860374036310796"
y2="1.5399999809265132" />
of authored chemistry
</atomArray>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a2 a4" order="2" />
<bond atomRefs2="a1 a5" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a1 a7" order="1" />
<bond atomRefs2="a3 a8" order="1" />
</bondArray>
</molecule>
</cml>
V1.0 now available (binary and open source)
http://research.microsoft.com/chem4word/
17. GenePattern Reproducible Research
Add-in Services: Connects to
GenePattern database
Relationships: Inline graphics
are synchronized to dataset
Data: Control and
execute query pipelines
Data: Resulting data (and into GenePattern
provenance) stored within
Word document
Source code and binary:
http://GenepatternWordAddin.codeplex.com
18. Research Information Centre (RIC) Project
Virtual Research Environment (VRE) Toolkit for SharePoint
Collaborative environment for
research groups
Personal site for each
researcher and project
site for each project
Document management,
federated search, social
networking, real-time
communication, blogs, wikis
Project Overview:
Version 1.1 (Open Source under Ms-PL):
http://research.microsoft.com/ric/ http://ric.codeplex.com/
http://research.microsoft.com/vre/
19. oreChem – The Chemical Semantic Web
• Lee Giles • Geoffrey Fox • Carl Lagoze • Jeremy Frey • Peter Murray-Rust
• Karl Mueller • Simon Coles • Jim Downing
• Prasenjit Mitra • Nico Adams
Demonstrating:
• Large collaboration project
focusing on interoperability
• At-source capture of
chemistry data
Semantic storage
• Chemical structure search
• Compound object authoring
• Retrospective harvesting of
chemistry data
• Reuse through common ORE
data model
• Semantic authoring
• Virtualized triple storage
experiments
documents scientists
text measurements molecules
data
data molecules
Compound Mash-up (re-use)
document of data
authoring
20. Enabling the Chemical Semantic Web
“RSC Publishing and Southampton University drive
the chemical semantic web…”
21. Recent developments of interest
Elsevier's Article of the Future Competition
Grand Challenge & Article of the Future contest -- ongoing collaboration between
Elsevier and the scientific community to redefine how a scientific article is
presented online.
PLoS Currents: Influenza
In conjunction with NIH & Google Knol – a rapid research note service, enable this
exchange by providing an open-access online resource for immediate, open
communication and discussion of new scientific data, analyses, and ideas in the field
of influenza. All content is moderated by an expert group of influenza researchers,
but in the interest of timeliness, does not undergo in-depth peer review.
Nature Preceedings
Connects thousands of researchers and provides a platform for sharing new and
preliminary findings with colleagues on a global scale – via pre-print manuscripts,
posters and presentations. Claim priority and receive feedback on your findings
prior to formal publication.
Mendeley (and Papers)
Called “iTunes” for academic papers; 400,000+ users have signed up and a
staggering 30+ million scientific papers have been uploaded.
23. Harvard’s “Dataverse” Project
http://thedata.org
Via web application software, data citation standards, and statistical methods, the
Dataverse Network project increases scholarly recognition and distributed control
for authors, journals, archives, teachers, and others who produce or organize data;
facilitates data access and analysis for researchers and students; and ensures long-
term preservation whether or not the data are in the public domain. [From the
Institute of Quantitative Social Science (IQSS) at Harvard University]
24.
25. Taverna
• Taverna is an open source and domain-
independent Workflow Management System
– A suite of tools used to design and execute scientific
workflows and aid in silico experimentation.
• Taverna has been created by the myGrid team and
funded through OMII-UK. The project has
guaranteed funding until 2014.
• The Taverna Suite is written in Java and includes
the Taverna Engine (used for enacting workflows)
that powers both the Taverna Workbench
(desktop client) and the Taverna Server.
26. More on Taverna
• Integrated with other myGrid tools
– social networking and workflow sharing
environment for scientists
– curated catalogue of Web services for Life
Sciences
27. Provenance
Log what, where,
when who
For data and for
publications
To Do
Ingredient List Dissolve 4- Add K2CO3 Heat at reflux Cool and add Heat at Cool and add Extract with Combine organics,
List
flourinated powder for 1.5 hours Br11OCB reflux until water (30ml) DCM dry over MgSO4 &
Fluorinated biphenyl 0.9 g
Br11OCB 1.59 g biphenyl in completion (3x40ml) filter
Potassium Carbonate 2.07 g butanone
Butanone 40 ml
Plan
Add Cool
Add Reflux Liquid-
Add Reflux Cool Add Dry Filter
liquid
extraction b
Ev
0.9031 grammes
excess g
Inorganics dissolve 2 3 of 40 ml
layers. Added brine
~20ml. text
image
Weigh
Measure
Measure
Sample of 4-
Butanone dried via silica column and
Process
measured into 100ml RB flask. flourinated
Record
Used 1ml extra solvent to wash out biphenyl Annotate
container. DCM MgSO4
Annotate
1 1 2 2 1 3 1 4 3 5 2 6 2 7 4 8 9 10 11
Add Cool
Add Reflux Add
Add Reflux Cool Liquid- Dry Filter
text Sample of liquid (Buchner)
Butanone Annotate
extraction b
Sample of Br11OCB
Water Annotate Annotate Ev
K2CO3
Measure Powder
Measure
27
Weigh Weigh
text
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
40 text Washed MgSO4 with text
ml for 45min, next step 14:15. Organics are yellow
DCM ~ 50ml
28. myGrid Open Suite of Tools
Workflow Repository Workflow GUI Workbench Client User Interfaces
and 3rd party plug-ins
Web Portals
Service Catalogue
Provenance Workflow Programming and
Store Server APIs
Activity and Service
Plug-in Manager
Open
Provenance
Model
Secure Service Access, and
Programming APIs
30. Project Trident
Built on Windows Workflow Foundation
Author, Execute and Monitor Workflows
Compose and
modify workflows
via drag & drop
canvas
View data products, performance
metrics, and provenance data
Version 1.2 (Open Source under Apache 2.0 License):
http://tridentworkflow.codeplex.com/
31. KNIME
• KNIME (Konstanz Information Miner)
• A user-friendly and comprehensive Open-
Source platform for:
– Data integration
– Processing
– Analysis
– Exploration
• Growing vendor adoption
– PerkinElmer, Shrodinger, Tripos, CCG,
ChemAxon, etc.
36. Envisioning a New Era of Research
Reporting
Reproducible
Research
Imagine…
• Live research reports
– multiple end-user ‘views’
– dynamically tailor presentations
Interactive Collaboration
• An Data
authoring environment that
absorbs and encapsulates
– research workflows
– outputs from the lab experiments
• A report that can be dropped into
an electronic lab workbench and Dynamic
reconstitute an entire experiment Documents
• Dynamic mash up data and
workflows across experiments
• Apply new analyses and Reputation
visualizations and perform new in & Influence
silico experiments
37. Agenda
• Part 1 – The Scientific Paper
• Part 2 – Emergence of the ePaper
• Part 3 – Of Workflows and Add-ins
• Part 4 – Impact of the ePaper
• Part 5 – A Glimpse to the Future
38. Impact of These Innovations
• On Science
• On the Business of Science
• On the Scientific Community
• And Other Emotional Factors . . .
39. Overall Impacts
Authors will be somewhat inconvenienced to
learn new things . . . But as readers and
consumers it will clearly be beneficial!
Across Industry and Academia it will be
positive advance
The vendors will be skeptical and reluctant to
change – but will move with the spending
community!
40. On the Scientific Community
• This will provide a significantly more
capable platform for science
– Extending collaboration
– Easing validation of research
– Offering transfer of knowledge and ease of
extension of research projects
• But is DOES further erode the status quo
system of rewards and tenure!
41. And Other Emotional Factors
Is There An Elephant In This Room??
• The Publishers??
• CAS?? Other A&I
companies??
• Well what about
Electronic Lab
Notebooks??
42. On the Business of Science
• Publishers will need to continue to evolve
to find a role as “cool provider” of these
tools and become a “hot” distribution
center
• A&I companies will need to redefine their
role
• Software vendors have a real opportunity,
if they can adapt . . .
43. The Value of the A & I Layers
Abstracting and Indexing in the Future
Going Forward
Today
• Indexing with Context “Built-
• Indexing is Key In”
• Precision and • Will Abstracting or more
The Old Days Recall correctly ‘Content Monitoring’
• “Beats” Google become the value add?
• Abstracting was • Or be an reliable data
every time
Key aggregator?
• True Assessment
of Content
44. Agenda
• Part 1 – The Scientific Paper
• Part 2 – Emergence of the ePaper
• Part 3 – Of Workflows and Add-ins
• Part 4 – Impact of the ePaper
• Part 5 – A Glimpse to the Future
45. Rich Content Sources Direct Search Tools
Challenge
Or
Opportunity
Reproducible Science Complete Provenance
46. The Opportunity Before Us
• Faster Development in an Increasingly
Complex World
– Improving reproducibility of scientific results
– Data Sharing and collaboration services
– Reliable maintenance of provenance
– Faster availability and efficient query tools
– Secure and/or controlled access to data
– Finding related data and research partners
– Assurance that data will be preserved
• A Brave New World for Scientific Discovery
and Research
– Cross-domain partnerships
– Enhanced broad availability of data and prior
research
• Improved Knowledge Transfer
– Both upstream and downstream
– Realizing the promise of translational medicine
47. Thank You!
Rudy Potenzone
SciencePoint Solutions
rudy@sciencepoint.com
Lee Dirks
Education & Scholarly Communication
Microsoft Research | Connections
ldirks@microsoft.com or scholar@microsoft.com
URL – http://www.microsoft.com/scholarlycomm/
Facebook: Scholarly Communication at Microsoft