This document provides a brief biography of the author and outlines their perspective on the complexity of biological systems and gene expression. It notes that a single specimen or species can show significant variability, and that gene expression varies based on factors like age, environmental stimuli, nutritional state, and interactions with other organisms like gut microbes. It argues that fully understanding biological systems requires considering all of these sources of variability and the interactions between different elements. The author's new role focuses on facilitating collaborations to better represent scientific knowledge by connecting experimental data across studies in a way that can help disentangle some of this complexity.
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
Why life is so complicated
1. Incidental
Collaboratories
For
Experimental
Data,
Or:
Why
life
is
so
complicated
(and
what
we
might
be
able
to
do
about
it)
Anita
de
Waard
VP
Research
Data
Collabora?ons,
Elsevier
RDS
Jericho,
VT,
USA
2. Outline
• Brief
bio
• The
problem:
life
is
complicated
• What
we
can
do
to
understand
it
• About
Elsevier
Research
Data
Services
• A
pilot
project
• Some
ques?ons.
3. Brief
bio:
• Background:
– Low-‐temperature
physics
(Leiden
&
Moscow)
– Joined
Elsevier
in
1988
as
publisher
in
solid
state
physics
– 1991:
ArXiV
=>
publishers
will
go
out
of
business
very
soon!
• 1997-‐
now:
Disrup?ve
Technologies
Director,
focus
on
beZer
representa?on
of
scien?fic
knowledge:
– Iden?fying
key
knowledge
elements
in
ar?cles
(linguis?cs
thesis)
– Building
claim-‐evidence
networks
(through
collabora?ons)
– Help
build
communi?es
to
accelerate
rate
of
change
(Force11)
• Star?ng
1/1/2013:
VP
Research
Data
Collabora?ons
-‐
why?
– Douglas
Engelbart’s
thinking:
connect
minds!
– My
(non-‐biologists)
understanding
of
biology:
4. Problem:
a
rose
is
not
a
rose:
• “Single
specimens
of
C.
ermineus
show
unchanged
injected
venom
mass
spectra
and
HPLC
profiles
over
?me.
However,
there
was
significant
variability
of
the
injected
venom
composi?on
from
specimen
to
specimen,
in
spite
of
their
common
biogeographic
origin.”
Jose
A.
Rivera-‐Or?z,
Herminsul
Cano,
Frank
Marí,
Intraspecies
variability
of
the
injected
venom
of
Conus
ermineus,
doi:10.1016/j.pep?des.2010.11.014
• “D.
desulfuricans
CFA
profiles
for
all
intes?nal
strains
(group
1)
were
approximately
iden?cal
(98.2
to
99.8%
similarity).
A
92.4%
similarity
was
evaluated
in
a
group
2,
containing
six
soil
strains.
The
members
of
this
group
had
87%
similarity
with
the
type
soil
strain.
All
intes?nal
strains
and
soil
strains
were
similar
at
the
85.5%
level.
Strains
DV-‐3/84
DV-‐7/84
(group
3)
showed
76.6%
similarity
to
each
other
and
were
similar
to
all
other
strains
at
the
67.6%
level.”
Zofia
Dzierżewicz
et
al.,
Intraspecies
variability
of
Desulfovibrio
desulfuricans
strains
determined
by
the
gene?c
profiles,
FEMS
Microbiology
LeZers,
Volume
219,
Issue
1,
14
February
2003,
Pages
69–74,
doi:10.1016/
S0378-‐1097(02)01199-‐0
=>
A
specimen
is
not
a
species!
5. Problem:
gene
expression
varies
with:
Age:
“SIRT1-‐Associated
genes
are
deregulated
in
the
aged
brain”
Philipp
Oberdoerffer
et
al.,
SIRT1
RedistribuDon
on
ChromaDn
Promotes
Genomic
Stability
but
Alters
Gene
Expression
during
Aging,
Cell,
Volume
135,
Issue
5,
28
November
2008,
Pages
907–918,
doi:10.1016/j.cell.2008.10.025
Smell:
“…major
urinary
proteins
[…]
mediate
the
pregnancy
blocking
effects
of
male
urine”
P.A.
Brennan,
et
al,
PaIerns
of
expression
of
the
immediate-‐early
gene
egr-‐1
in
the
accessory
olfactory
bulb
of
female
mice
exposed
to
pheromonal
consDtuents
of
male
urine,
Neuroscience,
Volume
90,
Issue
4,
June
1999,
P
1463–1470,
doi:10.1016/S0306-‐4522(98)00556-‐9
Hunger:
“Out
of
the
~30K
genes,
about
10K
are
differen?ally
expressed
in
liver
cells
when
an
animal
is
in
different
states
of
sa?ety.“
Zhang
F,
Xu
X,
Zhou
B,
He
Z,
Zhai
Q
(2011)
Gene
Expression
Profile
Change
and
Associated
Physiological
and
Pathological
Effects
in
Mouse
Liver
Induced
by
Fas?ng
and
Refeeding.
PLoS
ONE
6(11):
e27553.
doi:10.1371/journal.pone.002755
Light:
“Longer-‐term
enrichment
training
also
altered
the
mRNA
levels
of
many
genes
associated
with
structural
changes
that
occur
during
neuronal
growth.”
CailoZo
C.,
et
al.
(2009)
Effects
of
Nocturnal
Light
on
(Clock)
Gene
Expression
in
Peripheral
Organs:
A
Role
for
the
Autonomic
Innerva?on
of
the
Liver.
PLoS
ONE
4(5):
e5650.
doi:10.1371/journal.pone.0005650:
=>
Knowing
genes
is
not
knowing
how
they
are
expressed
!
6. Problem:
no
man
(or
mouse)
is
an
island…
• “We
found
the
diversity
and
abundance
of
each
habitat’s
signature
microbes
to
vary
widely
even
among
healthy
subjects,
with
strong
niche
specializa?on
both
within
and
among
individuals.”
The
Human
Microbiome
Project
Consor?um,
Structure,
func?on
and
diversity
of
the
healthy
human
microbiome,
Nature
486,
207–214
(14
June
2012)
doi:10.1038/nature11234
• “Coloniza?on
of
an
infant’s
gastrointes?nal
tract
begins
at
birth.
The
acquisi?on
and
normal
development
of
the
neonatal
microflora
is
vital
for
the
healthy
matura?on
of
the
immune
system.”
Mackie
RI,
Sghir
A,
Gaskins
HR.,
Developmental
microbial
ecology
of
the
neonatal
gastrointes?nal
tract.
Am
J
Clin
Nutr.
1999
May;69(5):1035S-‐1045S
=>
An
animal
is
an
ecosystem!
7. Problem:
system
interac?ons
create
even
greater
complexity:
• Compu?ng
cancer:
“No
amount
of
informa?on
about
what
happens
inside
a
single
cell
can
ever
tell
you
what
a
?ssue
is
going
to
do,”
[Glazier]
says.
“Much
of
the
informa?on
and
complexity
of
?ssues
and
life
is
embedded
in
the
way
cells
talk
to
each
other
and
the
extracellular
environment.”
• Megadata:
“These
complex
emergent
systems
are
impossible
to
understand,”
[Agus]
says.
“Our
level
of
understanding
is
just
so
cursory
that
we
have
to
start
to
look
for
what
they
call,
in
physics,
coarse-‐grained
elements.”,”[we]
founded
Applied
Proteomics
to
create
a
protein
diagnos?c
that
reveals
not
just
where
a
cancer
is,
but
how
it
interacts
with
the
body”
Nature
Special
Issue
Vol.
491
No.
7425
‘Physical
Scien?sts
Take
On
Cancer’
:
=>
The
whole
is
more
than
the
sum
of
its
parts!
8. Big
problem:
=>
A
specimen
is
not
a
species
=>
Knowing
genes
is
not
knowing
how
they
are
expressed
=>
An
animal
is
an
ecosystem
=>
The
whole
is
more
than
the
sum
of
its
parts
LIFE
IS
COMPLICATED!!
hZp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
9. Sta?s?cs
to
the
rescue!
With
enough
observa?ons,
trends
and
anomalies
can
be
detected:
•
“Here
we
present
resources
from
a
popula?on
of
242
healthy
adults
sampled
at
15
or
18
body
sites
up
to
three
?mes,
which
have
generated
5,177
microbial
taxonomic
profiles
from
16S
ribosomal
RNA
genes
and
over
3.5
terabases
of
metagenomic
sequence
so
far.”
The
Human
Microbiome
Project
Consor?um,
Structure,
func?on
and
diversity
of
the
healthy
human
microbiome,
Nature
486,
207–214
(14
June
2012)
doi:10.1038/nature11234
• “The
large
sample
size
—
4,298
North
Americans
of
European
descent
and
2,217
African
Americans
—
has
enabled
the
researchers
to
mine
down
into
the
human
genome.”
Nidhi
Subbaraman,
Nature
News,
28
November
2012,
High-‐resolu?on
sequencing
study
emphasizes
importance
of
rare
variants
in
disease.
• “A
profile
unique
for
a
DNA
sample
source
is
obtained
…
a
series
of
numbers
are
generated
which
can
be
used
as
a
bar
code
for
that
DNA
source.
A
registry
of
bar
codes
would
make
it
easy
to
compare
DNA
samples”
Roland
M.
Nardone,
Ph.D.,
Eradica?on
of
Cross-‐Contaminated
Cell
Lines:
A
Call
for
Ac?on,
hZp://www.sivb.org/publicPolicy_Eradica?on.pdf
10. We
need
‘incidental
collaboratories’
• Collect:
store
data
at
the
level
of
the
experiment:
– Accessible
through
a
single
interface
– With
enough
metadata
to
know
what
was
done/seen
• Connect:
allow
analyses
over:
– Similar
experiment
types
– Experiments
done
with/on
similar
biological
‘things’:
• Species,
strains,
systems,
cells
• Anatomical
components
(e.g.
spleen,
hypothalamus)
• An?bodies,
biomarkers,
bioac?ve
chemicals,
etc
11. Problem:
biological
research
is
quite
insular:
• Biology
is
small:
because
objects/
equipment
are
10^-‐5
–
10^2
m,
you
can
work
alone
(‘King’
and
‘subjects’).
Prepare
• Biology
is
messy:
it
doesn’t
happen
behind
a
terminal.
Ponder
Observe
• Biology
is
compe??ve:
different
Communicate
people
with
similar
skill
sets,
vying
for
the
same
grants.
Analyze
• In
summary:
it
does
not
promote
inherent
collabora?on
(vs.,
for
instance,
big
physics
or
astronomy).
12. We
need
to
pop
the
lab
bubble!
Prepare
Observa?ons
Labs
go
from
being
Analyze
Communicate
Think
Observa?ons
informa?on
islands,
to
being
‘sensors
in
a
Observa?ons
network’.
Prepare
Prepare
Analyze
Communicate
Analyze
Communicate
13. Some
objec?ons,
and
rebuZals:
Objec&on:
Rebu-al:
“But
our
lab
notebooks
are
all
on
Develop
smart
phone/tablet
apps
for
data
paper”
input
“I
need
to
see
a
direct
benefit
from
Develop
‘data
manipula?on
dashboard’
something
I
spend
my
?me
on”
for
PI
to
allow
beZer
access
to
full
experimental
output
for
his/her
lab
“I
am
afraid
other
people
might
Develop
intra-‐lab
data
communica?on
scoop
my
discoveries”
systems
first
and
allow
?med/granular
data
export
“I
want
things
to
be
peer
reviewed
Allow
reviewers
access
to
experimental
before
I
expose
them”
database
before
publica?on
(of
data
or
paper)
“I
don’t
really
trust
anyone
else’s
Add
a
social
networking
component
to
data
–
well,
except
for
the
guys
I
this
data
repository
so
you
know
who
(to
went
to
Grad
School
with…”
the
individual)
created
that
data
point.
14. Elsevier
Research
Data
Services:
Goals
1. Help
add
more
data
into
(exis?ng,
open)
data
repositories:
more
data
in,
annotated,
available
2. Make
them
more
interoperable:
work
towards
collaboratory
model
by
connec?ng
databases
3. Find
ways
to
make
them
sustainable,
e.g.:
– Service-‐level
agreements:
to
funders/ins?tutes
– With
Lab
notebook:
subscrip?ons
to
projects
– Back-‐end
analy?cs:
to
companies
15. RDS
Guiding
Principles:
• In
principle,
all
open
data
stays
open
and
URLs,
front
end
etc.
stay
where
they
are
(i.e.
with
repository)
• Collabora?on
is
tailored
to
data
repositories’
unique
needs/interests
and
of
a
‘service-‐model’
type:
– Aspects
where
collabora?on
is
needed
are
discussed
– A
collabora?on
plan
is
drawn
up
using
a
Service-‐Level
Agreement:
agree
on
?me,
condi?ons,
etc.
– All
communica?on,
finance,
IPR
etc.
is
completely
transparent
at
all
?mes.
• Very
small
(2/3
people)
department;
immediate
communica?on;
instant
deployment
of
ideas
16. RDS
Approach:
• Collaborate
and
build
on
rela?onships
with
data
repositories
• Integrate
with
other
content
sources,
if
possible
• Build
annota?on
and
standardisa?on
tools
and
processes
to
implement
this
• Develop
next-‐genera?on
infrastructure
solu?ons
for
back-‐end
integra?on
• Explore
crea?ve
revenue
opportuni?es
17. NIF
An?body
Registry:
Problem:
• 95
an?bodies
were
iden?fied
in
8
papers
• 52
did
not
contain
enough
informa?on
to
determine
the
an?body
used
• Some
provided
details
in
another
paper
• Failed
to
give
species,
vendor,
catalog
#
Solu?on
#
1:
• Journals
ask
authors
to
provide
an?body
catalog
nr
• Link
to
NIF
Registry
from
manufacturers/
vendors’
sites
Solu?on
#2:
• Pilot
with
a
lab:
18. Let’s
start
with
the
Urban
Lab
• Ge•ng
an?bodies
• And
messy
bits
• From
the
notebook
• Into
Nathan
Urban’s
command
center
• By
providing
– 7”
Tablets
– Links
to
IgorPro
– A
dashboard
UI
19. My
ques?ons
to
you:
• Thoughts
on
this
approach:
– In
principle?
– In
prac?ce?
• Do
you
see
serious
hurdles:
– Are
we
overlapping
with
other
ini?a?ves;
if
so,
are
we
complementary?
– How
does
this
connect
to
libraries/local
repositories?
– Are
there
sensi?vi?es/pain
points
we
are
overlooking?
• Where
to
start:
– Is
an?bodies
ok?
– Is
a
neuroscience
lab
ok?
– Thoughts
on
data
repositories/pla‚orms
to
connect
to?
20. Your
ques?ons
to
me?
a.dewaard@elsevier.com
hZp://elsatglabs.com/labs/anita/
hZp://www.slideshare.net/anitawaard
Thanks
go
to:
• Anita
Bandrowski
and
Maryann
Martone,
NIF
• Nathan
Urban,
Shreejoy
Tripathy,
CMU
• David
Marques,
SVP
RDS