The document discusses challenges in online science publishing and proposes seven "known knowns" or points:
1. The internet has caused an information overload.
2. Science papers contain facts.
3. The traditional narrative research article format is outdated and needs replacing.
4. Words contain meaning that depends on context and knowledge.
5. Words and logic contain scientific facts.
6. These facts can be modeled using XML and RDF standards.
7. Publishers should stop producing so many traditional research papers.
1. Reinventing the Research Article -
Seven Challenges in
Science Publishing
Anita de Waard
Researcher Disruptive Technologies,
Elsevier Labs
NWO - Casimir Grantee,
Utrecht University
2. Seven ’known knowns’ in online science publishing:
1. The internet has caused an information overload.
2. Science papers contain facts.
3. The narrative research article is outdated and needs to be
replaced.
4. Since words contain meaning,
5. And words (and logic) contain scientific fact,
6. We just need to model them with xml + rdf;
7. And the publishers should stop making all these papers.
3. 1. The internet has caused an information overload
- My own experience (as a researcher):
- Easy: find what I know exists
- OK: Finding things I expect hope exist
- Hard: making sure I haven’t missed anything
- However, none of these make me feel overwhelmed.
- Infuriating:
- Trying to respond to people who ask me something
- Managing three email accounts on 4 computers
- Following up on plans and projects
- However, we can improve the delivery of science content online.
4. 1. The internet has caused an information overload
- Pick (carve out) a first set of user needs, e.g.:
- Locate
- Understand
- Believe (Be convinced)
- Explore
- But this does not address WHAT you want to Locate, Understand, ..
- Semantic network in pharmacology: ‘Grey out what I already know’
1. How can we model a user’s interest?
5. 2. Science papers contain facts
- With FEBS Letters Editorial Office in Heidelberg/
MINT Database in Rome
- Structured Digital Abstract [Gerstein et. al]: ‘machine-readable
XML summary of pertinent facts’
- For FEBS: provide proteins, methods, protein-protein interactions,
as given in MINT:
- 2008: authors provide, editors check
- 2009: Word Plug-in tool suggests, authors (and editors) check
2. Can we create an ontology of doubt?
6. 2. Science papers contain facts
2. Can we create an ontology of doubt?
7. 3. The narrative RA should be replaced
Aristotle Quintilian Cell APA Style Guide
The introduction of a speech, where one announces the subject and purpose
prooimion Introduction exordium of the discourse, and where one usually employs the persuasive appeal of Introduction Introduction
ethos in order to establish credibility with the audience.
The second part of a classical oration, following the introduction or exordium.
The speaker here provides a narrative account of what has happened and
Statement of
prothesis narratio generally explains the nature of the case. Quintilian adds that the narratio is Introduction Introduction
Facts
followed by the propositio, a kind of summary of the issues or a statement of
the charge.
Coming between the narratio and the partitio of a classical oration, the
Summary propostitio propositio provides a brief summary of what one is about to speak on, or Abstract Abstract
concisely puts forth the charges or accusation.
Following the statement of facts, or narratio, comes the partitio or divisio. In
Division/ this section of the oration, the speaker outlines what will follow, in accordance Table of
partitio Article Outline
outline with what's been stated as the status, or point at issue in the case. Quintilian Contents
suggests the partitio is blended with the propositio and also assists memory.
Following the division / outline or partitio comes the main body of the speech
pistis Proof confirmatio where one offers logical arguments as proof. The appeal to logos is Results Methods, Results
emphasized here.
Following the the confirmatio or section on proof in a classical oration, comes
Refutation refutatio the refutation. As the name connotes, this section of a speech was devoted to Discussion Discussion
answering the counterarguments of one's opponent.
Following the refutatio and concluding the classical oration, the peroratio
epilogos peroratio conventionally employed appeals through pathos, and often included a Discussion Discussion
summing up (see the figures of summary, below).
8. 3. The narrative RA should be replaced
The Story of Goldilocks Story Grammar Paper The AXH Domain of Ataxin-1 Mediates
and the Three Bears Neurodegeneration through Its Interaction with Gfi-1/
- Narrative is how stories are told; ‘the truth can onlly be told in
Once upon a time Time Setting Background
Senseless Proteins
The mechanisms mediating SCA1 pathogenesis are still not fully
stories’.... Characters
a little girl named Goldilocks Objects of study
understood, but some general principles have emerged.
the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
- Narrative is essential for persuasion studied andprotein in vivo effects and interactions to those of
She went for a walk in the
forest. Pretty soon, she came
Location Experimental
setup the human
compared
upon a house.
3. How can we represent narrative online? interactions contributes to SCA1
She knocked and, when no one Goal
answered,
Theme Research
goal
Gain insight into how Atx-1's function
pathogenesis. How these might contribute to the
disease process and how they might cause toxicity in only a subse
of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When
were three bowls of porridge. Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand
the first bowl. and Perrimon, 1993) and compared its effects to those of hAtx-1.
This porridge is too hot! she Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives
exclaimed. expression in the differentiated R1-R6 photoreceptor cells
(Mollereau et al., 2000 and O'Tousa et al., 1985), results in
neurodegeneration in the eye, as does overexpression of hAtx-1
[82Q]. Although at 2 days after eclosion, overexpression of either
Atx-1 does not show obvious morphological changes in the
So, she tasted the porridge Data (data not shown),
photoreceptor cells
from the second bowl.
This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity at
said 28 days
So, she tasted the last bowl of Data (Figures 1B-1D).
porridge.
9. 4. Words contain meaning
Sicilian?
- ‘A word is worth a thousand pictures’ (Don Loritz)
- The meaning of words occurs in context and is dependent
on knowledge and experience
- This is even more so in science:
PSA = Prostate-Specific Antigen or Pot Smokers
Association of America?
10. 4. Words contain meaning
- Cognitive linguistics: language and cognition cannot
be separated - language acts are cognitive acts
- Lakoff, metaphor: ‘anger is heat’
- Meaning is created in the mind:
a word is not (only) a ‘particle’ but (also) a ‘wave’:
Hearing/reading is not unpacking a package, but
resonating at a specific frequency - context is its
medium - context-free language does not exist!
4. How do we model cognitive context?
11. 5. Words (and logic) contain scientific fact
• “[Y]ou can transform a fact into fiction or a fiction into fact just by
adding or subtracting references [and data]”
– Bruno Latour, ‘Science in Action’,1987
24. M. Scheffner, B.A. Werness, J.M. Huibregtse, A.J. Levine and
“We generated an MCF-7 P.M. Howley, The E6 oncoprotein encoded by human
papillomavirus types 16 and 18 promotes the degradation of
derivative that expresses the p53. Cell 63 (1990), pp. 1129–1136. SummaryPlus | Full Text
+ Links | PDF (1728 K) | Abstract + References in Scopus |
HPV16 E6 protein, which Cited By in Scopus
mediates degradation of p53
([24]).”
“In the presence of E6, p53
stabilization in response to IR
was almost completely
prevented in MCF-7 cells
(Figure 1A).” Figure 1. Initiation and Maintenance of G1 Arrest Induced by
IR(A) Stable MCF-7 clones containing either pCDNA3.1 (Neo)
or pCDNA3.1-E6 were irradiated (20 Gy), and cellular protein
extracts were made 2 hr later, separated on 10% SDS PAGE,
and immunoblotted to detect p53 and cyclin D1 proteins.
11
12. 5. Words (and logic) contain scientific fact
- Main goal of article is to persuade
- The author is a medium that enables the article to get itself
published (a la selfish gene/meme)
- Essential persuasive elements are non-textual
5. How do we represent non-textual elements?
13. Discourse Segments
- “A text is made up of Discourse Segments
and the relations between them” - Grosz and
Sidner, Mann-Thomson, Marcu, Swales
- Discourse Segment Purpose: element that
has a consistent rhetorical/pragmatic goal.
- Define for Biological Research Article
14. 6. Just model the facts with xml + rdf
A model of a biology research article:
<EXPERIMENTS>
<Experiment>
<Header header="h1">p53-Independent Initiation of G1 Arrest Induced by IR</Header>
<Fact fact="fa1" factref="br26">Since the transcriptional response by p53 is a relatively slow process,</Fact>
<Problem problem="p1">we asked whether initiation of a G1 arrest following genotoxic stress requires p53. </Problem>
<Method method="m1">We generated an MCF-7 derivative </Method>
<Fact fact="fa2" factref="br24">that expresses the HPV16 E6 protein, which mediates degradation of p53
(<Bibref bib="br24">[24]</Bibref>).</Fact>
<Result result="r1">In the presence of E6, p53 stabilization in response to IR was almost completely prevented in MCF-7
cells (<Figref figref="agami1.gif">Figure 1A).</Figref></Result>
<Result result="r2">Consistent with this, no induction of p21cip1 by IR was seen in the E6-expressing MCF-7 cells
<Figref figref="none.gif">(data not shown).</Figref></Result>
...
19. Discourse: A Fact(ory)
hypothetical realm: realm of activity:
(might, would) (to test, to see)
goal
to
problem results
we realm of
introduction method experience:
past
resulting in
result
incongruity/ignorance
hypothesis suggests that
discussion
realm of models:
fact fact fact implication present
Shared view Own view discussion
20. 6. Just model the facts with xml + rdf
- In practice: ScienceDirect does not use our XML... (shhh....)
- At Elsevier: Project Harpoon: ‘stab’ the document with metadata,
asynchronous, linked in (XPath/XQuery), distributed
- In XML - how to access a phrase inside an article:
- access inside a PDF by coordinates? Format, content changes
- add IDs to every single element? Format, content, version
changes?
- How to represent relations, even if we know where they link?
6. How can we better model discourse elements (and
relations)?
21. 7. And publishers should stop making all those papers.
- 6 uses of a RA:
- job application
- report card
- thesis
- conference tickets
- research assessment
- and yes, by the way, reporting on scientific work.
- Scientists are evaluated largely based on publications:
this enables their production to be evaluated by non-
specialists
- This places an undue stress on quantity, conformity (for
risk of being rejected), publishing for its own sake.
7. How can we disentangle communication and
evaluation?
22. Seven ‘Known Unknowns’ in Online Science Publishing
1. How can we model a user’s interest?
2. Can we create an ontology of doubt?
3. How can we represent narrative online?
4. How do we model cognitive context?
5. How do we represent non-textual elements?
6. How can we better model discourse elements and relations?
7. How can we disentangle communication and evaluation?
23. http://www.elseviergrandchallenge.com/
The Elsevier Grand Challenge: Knowledge Enhancement in the Life Sciences is a contest created to
improve the way scientific information is communicated and used. The contest invites members of the
scientific community to describe and prototype a tool to improve the interpretation and identification of
meaning in (online) journals and text databases relating to the life sciences.
Specifically we are looking for new ways to:
1. improve the process/methods/results of creating, reviewing and editing scientific content
2. interpret, visualize or connect the knowledge more effectively, and/or
3. provide tools/ideas for measuring the impact of these improvements.
Abstracts are now invited - Submissions will close on July 15th, 2008.
-Finalists will be invited to present their vision papers in a public symposium, at which the Panel of
Judges will announce the winners.
-The first place winner will be awarded a cash prize of US$35,000
-The second place winner a cash prize of US$15,000.
-All finalists will receive free trial access to ScienceDirect and Scopus for a year.
24. Unknown unknowns?
Would you care to correct/contradict/join me?
Anita de Waard,
http://people.cs.uu.nl/anita
anita@cs.uu.nl.