Slides describing Force11 Work and background of several of the speakers, used for talks to University of Lethbridge, Carnegie Mellon and to Elsevier internally
Choosing the Right CBSE School A Comprehensive Guide for Parents
How to Execute A Research Paper
1. How To Execute
The Research Paper
Anita de Waard
Disruptive Technologies Director
Elsevier Labs
Pittsburgh, April 2012
2. Outline
• Ten people who are changing scholarly publishing:
– New forms
– Workflow/data integration
– New models of business/attribution
• So what does this mean?
• Some projects to help us move towards these new
models:
– Claim-evidence networks
– Workflow/data integration and executable papers
– Creating a community of practice
3. Theme 1: New forms of publication
• Main issue: the format of the scientific paper comes
from a time when our communication was paper-
centric
• Solution: Rethink the unit and form of the scholarly
publication from the ground (i.e., the experiment) up
• Three projects doing that:
4. Steve Pettifer, U Manchester
• Utopia: ‘Everything you always wanted to do
with a PDF….’: interactive, sharable
• Working on integration with DOMEO to
add/share annotations
• Final goal: don’t ‘reconstruct the cow from a
hamburger’: include workflows and models
5. Gully Burns, USC ISI
• KEfED: model of research as an
activity
• Map out
dependent/independent
variables
within an experiment and
model them
• Start: appendix to paper; later:
precede paper, graft paper on
top of model.
6. Tim Clark, Harvard/MGH
swande:Claim
<http://tinyurl.com/4h2am3a>
Intramembranous Aβ behaves as chaperones of
other membrane proteins
rdf:type
dct:title
G1
<http://example.info/person/1>
pav:contributedBy
<http://example.info/citation/1>
swanrel:referencesAsSupportiveEvidence
G5
G6
• Annotation ontology allows you to trace claims
• DOMEO offers interface to do both automated
entity markup + manual mark up of
claim/evidence networks
7. Theme 2: data and workflow integration
• Issues:
– Format of the research paper hard to integrate within a
scientific/clinical workflow
– Hard to reproduce/deduce: what methods were used and
what data was created for a piece of research, making
reproduction or even review difficult
• Some solutions for sharing workflows and data:
9. Phil Bourne, UCSD
• Big need: keep track of the data in my lab!
• Other need: know what I did/what other people
did – Yolanda Gil made workflow representation,
was hard to remember what we did…
• Need: better ways to record, share, archive what
we did.
• New role for the publisher >
10. Deborah McGuinness, RPI
• Future Web:
• ‘if everything is everywhere, how do we find
it/know what we want?’
• Internet, Web, Grid, Cloud, Semantic Grid
Middleware
• Xinformatics:
• Where X = geo, eco, econo…
• Linked Data to Semantics
• Semantic Foundations:
• Pushing the boundaries of
Semantic Web standards
• Ontology evolution
11. Theme 3: New Models for Access/Attribution
• Issues:
– User-created content, crowdsourcing means (scientific)
impact is measured very differently from the past
– Need new models for copyright/IP
– Citizen scientists participate as well
• Some efforts to address this:
12. Paul Groth, VU Amsterdam
Altmetrics: “the creation and study of
new metrics based on the Social Web
for analyzing and informing scholarship.”
Including:
- Downloads
- Where readers read
- Data citation
- Social network diffusion
- Slide reuse
- Peer review contributions
- Youtube views
13. Leslie Chan, U. Toronto Scarborough
• ElPub conference series that focus
on globally connecting information scientists
• Bioline International system “a not-for-profit
scholarly publishing cooperative committed to
providing open access to quality research journals
published in developing countries”:
14. John Wilbanks, Kauffman/CC
• As data becomes more accessible, need:
• raw metadata
• standards processes
• consensus processes
• document submission standards
• data archives
• Ways of governing access:
• Privacy vs. IP vs. policies
• Technology only helps so much…
• This is mostly a social/policy issue
15. Cameron Neylon, Cambridge
• Main arguments for Open Access:
• Citizen science is becoming more important
• Science changes when it is crowdsourced:
Tim Gowers: ‘This is to normal research as
driving is to pushing a car’
• Three principles:
• Scale and connectivity
• Reduced friction to access
• Demand-side filters
16. In summary, scientists are working on:
• Tools for knowledge…
– Visualisation (Steve Pettifer)
– Modeling (Gully Burns)
– Annotation (Tim Clark)
• Ways to link to
– Workflows (Dave De Roure)
– Lab data (Phil Bourne)
– Linked research data (Deborah McGuinness)
• And models for
– Attribution/credit (Paul Groth)
– Allowing new players to participate (Leslie Chan)
– Copyright/IP rights (John Wilbanks)
– Networked science (Cameron Neylon).
17. New roles for publishers and libraries
• Technically, there is no reason to publish in a
journal– or for that matter, to publish a paper at all:
• Perhaps a good blog post linked to workflows and
data with some validation from peers and good
download statistics might work just as well?
• Is publishing in journals mostly a habit?
18. “Publishers have been thinking we’re going out of
business for 20 years, what has suddenly changed?”
The internet! Not the technical web, but the social web….
‘The value of a […] network is proportional to the square of
the number of users of the system (n²)’ Metcalfe’s Law
1990’s:
Big Player
2000’s:
Medium Participant
2015:
Irrelevant!
19. 19
What do we need?
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’,
http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;have
histories of their encounters and experienceshave
agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])
Create semantically rich aggregations of resources,
that can possess some scientific intent or support
some research objective
Networked Knowledge: (Neylon, [3])
If we care about taking advantage of the web and
internet for research then we must tackle the building
of scholarly communication networks.
These networks will have two critical characteristics:
scale and a lack of friction. [3]
20. Some examples of networked science:
• Galaxy Zoo: citizen science: classify galaxies in the
comfort of your own home – like Hanny!
• Tim Gowers, Polymath: “…the real contributors will be
the process owners and project leaders that are able to
provide horizontal leadership. To support this shift,
organizations will need to reward and recognize
horizontal contributions as much, if not more, than
hierarchical positions.”
• Mathoverflow: virtual network of mathemagicians
working collectively to answer big, small, clear and
fuzzy questions
22. Some other publisher
6. User applications: distributed applications run on this
‘exposed data’ universe.
Wrapping a story around your data:
Concept developed with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata (including
provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
5. Publishing and distribution: When a paper is published, a
collection of validated information is exposed to the world. It
remains connected to its related data item, and its heritage can
be traced.
2. Workflow: All data items created in the lab are added to a
(lab-owned) workflow system.
4. Editing and review: Once the co-authors agree, the paper is
‘exposed’ to the editors, who in turn expose it to reviewers.
Reports are stored in the authoring/editing system, the paper gets
updated, until it is validated.
Review
Edit
Revise
Rats were subjected to two grueling
tests
(click on fig 2 to see underlying data).
These results suggest that the
neurological pain pro-
3. Authoring: A paper is written in an authoring tool which can pull
data with provenance from the workflow tool in the appropriate
representation into the document.
23. 23
Creating claim-evidence networks:
• DOMEO: connect to Science Direct
• Rich Boyce’s Drug-drug interactions: tracing
heritage of claims
• Founding that: linguistic markers for identifying
cited/own knowledge:
24. How a claim becomes a fact:
• Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the
tumorsuppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-
373 were found to allow proliferation of primary human cells that
express oncogenic RAS and active p53, possibly by inhibiting the tumor
suppressor LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-
373, function as potential novel oncogenes in testicular germ cell
tumors by inhibition of LATS2 expression, which suggests that Lats2 is an
important tumor suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373,
directly inhibit the expression of Lats2, thereby allowing tumorigenic
growth in the presence of p53 (Voorhoeve et al., 2006).”
25. Working on ontology:
1. Add to formal knowledge representations, e.g. Biological
Expression Language add {V = 3, S = N, B = 0}:
• SET Evidence = "Arterial cells are highly susceptible to oxidative stress, which can induce
both necrosis and apoptosis (programmed cell death) [1,2]"
• biologicalProcess(GO:"response to oxidative stress") increases
biologicalProcess(GO:"apoptotic process")
• biologicalProcess(GO:"response to oxidative stress") increases
biologicalProcess(GO:necrosis)
2. Improve triple search engines, e.g. compare in iHop:
• The Lats2 tumor suppressor protein has been implicated earlier in promoting p53
activation in response to mitotic apparatus stress {V = 2, S = NN, B = 0}
• Our findings reveal that miR-373 would be a potential oncogene and it participates in
the carcinogenesis of human esophageal cancer {V = 1/2?, S = A, B = D}
26. Application: Elsevier/Philips Use Case:
3 Content Sources, 2 Link Steps
A. Philips’ Electronic Patient Records
B. Elsevier-published
Clinical Guideline
C. Elsevier (or other publisher’s)
Research Report or Data
Step 1: Patient data +
diagnosis link to Guideline
recommendation
Step 2: Guideline recommendation
links to research report/data
28. FORCE11 Community of Practice
• Workshop in August of 2011: 35 invited attendees from different
parts of science, industry, funding agencies, data centers
• Goal: map main obstacles preventing new models of science
publishing and develop ways to overcome them
• Just received funding from
Sloan foundation to:
• Start online community
• Hold next workshop
• Collaboratively work on
new efforts
29. Summary:
• Ten people who are changing scholarly publishing:
• We (publishers, editors, libraries, etc) need to revisit if
and how we are needed
• Some projects are underway to help us move towards
these new models:
– Networked science
– Workflow/data integration
– Identifying claim-evidence trails
• ….happy to collaborate on others!
http://elsatglabs.com/labs/anita
a.dewaard@elsevier.com
Notes de l'éditeur
This is reflected in a third distinctive – the pack. This is Paul Fishers pack from the Tryps example.
Some packs contain example input and output data so workflows can be checked for “decay” (they don’t actually rot, but the world changes round them).
While others are looking at semantically enhanced publication, we are asking “what is the shared artefact of future research?” We come at the same problem from the other side. We have it surrounded! Our approach relieves us of the paper mindest – so, for example, a Research Object could contain information for many audiences and purposes, with a commonly interpreted core (social scientists will recognise the idea of a “boundary object”).