SlideShare a Scribd company logo
1 of 29
Creating an Urban Legend:
A System for Electrophysiology Data
Management and Exploration
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
Outline:
• Life is complicated
• A small pilot
• Context and next steps
Life is complicated!
1. Interspecies variability > A specimen is not a species!
2. Gene expression variability > Knowing genes is not
knowing how they are expressed!
3. Microbiome > An animal is an ecosystem!
4. Systems biology > Whole is more than the sum of its parts!
5. Models vs. experiment > Are we talking about the same
things? In a way we can all use?
6. Dynamics > Life is not in equilibrium!

=> Reductionism doesn’t
work for living systems!
http://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
Statistics could help…
With enough observations, trends and anomalies can be
detected:

• “Here we present resources from a population of 242
healthy adults sampled at 15 or 18 body sites up to three
times, which have generated 5,177 microbial taxonomic
profiles from 16S ribosomal RNA genes and over 3.5
terabases of metagenomic sequence so far.”
The Human Microbiome Project Consortium, Structure, function and diversity of
the healthy human microbiome, Nature 486, 207–214 (14 June 2012)
doi:10.1038/nature11234

• “The large sample size — 4,298 North Americans of
European descent and 2,217 African Americans — has
enabled the researchers to mine down into the human
genome.”
Nidhi Subbaraman, Nature News, 28 November 2012, High-resolution sequencing
study emphasizes importance of rare variants in disease.
…but biological research is insular.
• Biology is small: size 10^-5 – 10^2 m,
scientist can work alone (‘King’ and
‘subjects’).
• Biology is messy: it doesn’t
happen behind a terminal.
• Biology is competitive: many
Ponder
people with similar skill sets,
Communicate
vying for the same grants
• In summary: the structure of biological
research does not inherently promote
collaboration (vs., for instance, HE physics or
astronomy (and they’re not all they’re cracked up to be,
either…)).

Prepare

Observe

Analyze
What if we could connect experiments?
Across labs, experiments:
track reagents and how
they are used
Observations
Observations
Observations
Prepare

Prepare
Analyze

Communicate

Analyze

Communicate
What if we could connect experiments?
Compare outcome of
interactions with these
entities

Observations
Observations
Observations

Prepare

Prepare
Analyze

Communicate

Analyze

Communicate
What if we could connect experiments?
Build a ‘virtual reagent
spectrogram’ by comparing
how different entities
interacted in different
experiments

Observations

Think

Observations
Observations

Prepare

Prepare
Analyze

Communicate

Reason collectively!

Communicate Analyze
Research Data Management today:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
An Urban Legend is born:
• How can we make a standard neuroscience
wet lab more data-sharing savvy?
• Incorporate structured workflows into the daily
practice of a typical electrophysiology lab (the
Urban Lab at CMU)
– What does it take?
– Where are points of conflict?

• 1-year pilot, funded by Elsevier RDS:
– CMU: Shreejoy Tripathy, manage/user test
– Elsevier: development, UI, project management
Goal: Enable Effective data sharing:
• Effective data sharing = “someone who is not the
person who collected the data can understand the
experiment and data” (Shreejoy’s definition)
– So datasets should be more or less self-describing
– > 90% of data sharing use cases are an experimentalist
sharing data with a future version of herself or with a
labmate

• Not just experimental data file, but also the
experimental metadata:
– What was done? What does this variable mean?
– This is usually stored in paper lab notebooks,
understandable by only the experimenter
Main Assumptions:
1. Effective data sharing
includes raw data files +
experimental metadata
(typically stored in a lab
notebook)
2. You know most about an
experiment while you’re
performing it
3. Improved data practices can
make labs more productive
and more creative

SDB_MC_12_voltages.mat
Components:
Metadata App:
Data integration:
• Syncing of metadata
app and
electrophysiology data
acquisition via server
• Each trace of
experimental data
annotated with
metadata
• IGOR-Pro specific,
support pClamp, other
acquisition packages as
needed later
Electrophysiology Data Looks like this:
Semantic Integration:
Entity tables uses a scope and
an attributes field to create
a NoSQL like, hierarchical
key/value structure in
PostgreSQL with the built-in
hstore extension.
Ontology Information (in
normalized sql tables) map
keys, values & scopes to
ontology information.

Entity
ID : UUID
Investigator : references investigators
table
created : timestamp
last_modified : timestamp
scope : string ~ /[A-Z]d+(::[AZ]d+)*/
attributes : hstore (string → string
mapping)
Data dashboard (planned):
• Use collected metadata to sort
experiments: organize by
mouse strain, neuron type,
animal age
• Enable in-browser analyses:
track provenance of analyzed
data back to raw data: “what
was that outlier?”
• Simple link in to publishing/data
sharing tools: “we can publish
papers no one else can”
Next steps Urban Legend Project:
• Populate data server with many experiments:
– Are people using it? Why/why not?
– What questions can we answer now that we
couldn’t before?
• Export data to neuroscience databases: NIF, INCF
Dataspace, neuroelectro.org
• How adaptable is this solution for use in other labs?
• Can we scale this up and make it sustainable?
• Software is available! Ready to swap this simple system
for something better: point is process!
• How does it fit into a larger data infrastructure within
the institution/nationally/internationally?
Elsevier Research Data Services:
• Main goal: make research data optimally available,
discoverable and reusable
• Collaboration is tailored to partner’s unique needs:
– Working with a few domain-specific and institutional
repositories and institutions
– Aspects where collaboration is needed are discussed
– Collaboration plan is drawn up using SLA: agree on time,
conditions, etc.

• 2013/2014: series of pilots, studies and reports to
enable feasibility study:
– What are key needs?
– Can Elsevier play a role: skillsets, partnerships?
– Is there a (transparent) business model for this?
Institutional
Context:

Funding
Agencies
Performance
reporting

Institution
Library
Research Office

Usage/Citation
reporting

Institutional
Repository

Indexing
Integrated
Performance
Query

Usage/Citation
reporting
Indexing

Research Data
Repositories

Unified Metadata Layer

Curation

Deposit /
Store
Indexing
Generic Data Storage
(such as Dropbox)

Electronic Lab
Notebooks

Integrated
Data Search
Data Flow
Performance Reporting

Deposit /
Store

Indexing & Search

Researchers
Data Initiatives:
• Data Citation group:
– Synthesize principles of proper data citation
– ‘Declaration of Data Citation Principles’, 8 principles of
successful data citation -http://www.force11.org/datacitation

• Resource Identification Initiative:
– Promote research resource identification, discovery, and reuse
– Resource Identification Portal http://scicrunch.com/resources
– Central location for obtaining research resource identifiers
(RRIDs) for materials and software used in biomedical research
• Antibody: Abgent Cat# AP7251E, ABR:AB_2140114
• Tool: CellProfiler Image Analysis Software, NIFRegistry:nif-0000-00280
• Organism: MGI:MGI:3840442
Summary:
• Life is complicated: knowledge needs to
be connected!
• A small pilot: “Urban Legend”
• Context and next steps:
– Working with institutions and databases to piece
together this puzzle
– Force11 is contributing some pieces
Thank you!
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick
Gerkin,
• Santosh Chandrasekaran, Matthew Geramita, Eduard Hovy
• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan
Fleming, Ilya Zaslavsky
• NIF/Force11: Maryann Martone, Anita Bandrowski
• OHSU: Melissa Haendel, Nicole Vasilevsky
• California Digital Library: Carly Strasser, John Kunze, Stephen
Abrams
• Elsevier: Mark Harviston, Jez Alder, David Marques
Questions?
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com

http://researchdata.elsevier.com/
Scopes
Follows the format L#::L#::L#...
where L is a letter identifier and # is any number of decimal
digits.
Example: P1::S1::R3 = Animal Prep 1, Slice 1, Run 3
The Letter need not be globally unique but only chain unique.
Example: P1::S1::E1(Electrode) is different from P1::S1::R1::E1
(Run-Electrode)
Scopes are 1 indexed.
Attributes
Each scope has an attributes field that consists
of multiple key, value pairs.
The keys are unique and not tied to scope. (e.g.
electrode_name instead of name).
Keys can be a choice, scalar (with units), or freetext field and which is determined by the
ontology tables.
Downsides to Flexible Schema
Converting to/from the flat scopes to a true hierarchy
(say in JSON) is rather complicated and led to many
errors in the App.
Very easy to get corrupted data in the App.
Schema is closely aligned to the way the lua App did
things.
A flexible schema was a good choice, but not scopes for
hierarchies.
Raw Data
For use in data-dashboard.
Standardized on HDF5.
Files uploaded via FTP.
Username, filename, and metadata w/i the
HDF5 file used to identify associated metadata
records.
Batch or individually uploaded.

More Related Content

What's hot

SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
 
One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersPhilip Bourne
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesMichel Dumontier
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
 
The contribution of authors: A study of the relationship between the size and...
The contribution of authors: A study of the relationship between the size and...The contribution of authors: A study of the relationship between the size and...
The contribution of authors: A study of the relationship between the size and...Rickard Danell
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
 
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13DataDryad
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Academig: Pitch Deck
Academig: Pitch DeckAcademig: Pitch Deck
Academig: Pitch DeckRony Pozner
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 

What's hot (20)

SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific Publishers
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
The contribution of authors: A study of the relationship between the size and...
The contribution of authors: A study of the relationship between the size and...The contribution of authors: A study of the relationship between the size and...
The contribution of authors: A study of the relationship between the size and...
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Academig: Pitch Deck
Academig: Pitch DeckAcademig: Pitch Deck
Academig: Pitch Deck
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 

Viewers also liked

NAG Consultation Guidance
NAG Consultation GuidanceNAG Consultation Guidance
NAG Consultation GuidanceNick Norton
 
Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...
Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...
Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...Aaron Organ
 
Zach Beaulieu Portfolio
Zach Beaulieu PortfolioZach Beaulieu Portfolio
Zach Beaulieu Portfolioguestd82bf3
 
Evaluation Question 3
Evaluation Question 3Evaluation Question 3
Evaluation Question 3shunn1995
 
Envisioning Good Design in Lancaster County and Philadelphia
Envisioning Good Design in Lancaster County and PhiladelphiaEnvisioning Good Design in Lancaster County and Philadelphia
Envisioning Good Design in Lancaster County and PhiladelphiaWallace Roberts & Todd
 
Conservation. Lorena and Floren. 3rd
Conservation. Lorena  and Floren. 3rdConservation. Lorena  and Floren. 3rd
Conservation. Lorena and Floren. 3rdsluguarda
 
Advancing Sustainability in Discretionary Review 1
Advancing Sustainability in Discretionary Review 1Advancing Sustainability in Discretionary Review 1
Advancing Sustainability in Discretionary Review 1Wallace Roberts & Todd
 
Leadership Essentials: Delivering Your Local Plan
Leadership Essentials: Delivering Your Local PlanLeadership Essentials: Delivering Your Local Plan
Leadership Essentials: Delivering Your Local PlanPAS_Team
 
BizIt 2007 Presentation
BizIt 2007 PresentationBizIt 2007 Presentation
BizIt 2007 PresentationKaren Cheng
 
The Conservation Strategy
The Conservation StrategyThe Conservation Strategy
The Conservation Strategyguestb92924e
 
South Suburban Master Plan Public Meeting Presentation
South Suburban Master Plan Public Meeting PresentationSouth Suburban Master Plan Public Meeting Presentation
South Suburban Master Plan Public Meeting PresentationGreg Collette
 
Manhattan Kansas Bicycle Master Plan Revision
Manhattan Kansas Bicycle Master Plan RevisionManhattan Kansas Bicycle Master Plan Revision
Manhattan Kansas Bicycle Master Plan Revisionmwesch
 
Dlr GI Strategy 2015-2022_final_medres_recvd_20141203
Dlr GI Strategy 2015-2022_final_medres_recvd_20141203Dlr GI Strategy 2015-2022_final_medres_recvd_20141203
Dlr GI Strategy 2015-2022_final_medres_recvd_20141203Aidan J ffrench
 
Critical assesment of the Sustainable Urban Drainage component of the Church ...
Critical assesment of the Sustainable Urban Drainage component of the Church ...Critical assesment of the Sustainable Urban Drainage component of the Church ...
Critical assesment of the Sustainable Urban Drainage component of the Church ...Achim von Malotki
 

Viewers also liked (20)

Willacy count master plan combined
Willacy count master plan combinedWillacy count master plan combined
Willacy count master plan combined
 
NAG Consultation Guidance
NAG Consultation GuidanceNAG Consultation Guidance
NAG Consultation Guidance
 
Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...
Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...
Melbourne's Biodiversity Conservation Strategy_Ecology and Heritage Partners ...
 
Zach Beaulieu Portfolio
Zach Beaulieu PortfolioZach Beaulieu Portfolio
Zach Beaulieu Portfolio
 
Evaluation Question 3
Evaluation Question 3Evaluation Question 3
Evaluation Question 3
 
Envisioning Good Design in Lancaster County and Philadelphia
Envisioning Good Design in Lancaster County and PhiladelphiaEnvisioning Good Design in Lancaster County and Philadelphia
Envisioning Good Design in Lancaster County and Philadelphia
 
Land Use & Planning - Fresno FPC
Land Use & Planning - Fresno FPCLand Use & Planning - Fresno FPC
Land Use & Planning - Fresno FPC
 
Conservation. Lorena and Floren. 3rd
Conservation. Lorena  and Floren. 3rdConservation. Lorena  and Floren. 3rd
Conservation. Lorena and Floren. 3rd
 
Advancing Sustainability in Discretionary Review 1
Advancing Sustainability in Discretionary Review 1Advancing Sustainability in Discretionary Review 1
Advancing Sustainability in Discretionary Review 1
 
Leadership Essentials: Delivering Your Local Plan
Leadership Essentials: Delivering Your Local PlanLeadership Essentials: Delivering Your Local Plan
Leadership Essentials: Delivering Your Local Plan
 
BizIt 2007 Presentation
BizIt 2007 PresentationBizIt 2007 Presentation
BizIt 2007 Presentation
 
Singapore
SingaporeSingapore
Singapore
 
The Conservation Strategy
The Conservation StrategyThe Conservation Strategy
The Conservation Strategy
 
SNEAPA 2013 Thursday b5 10_30 waterbury green
SNEAPA 2013 Thursday b5 10_30 waterbury greenSNEAPA 2013 Thursday b5 10_30 waterbury green
SNEAPA 2013 Thursday b5 10_30 waterbury green
 
South Suburban Master Plan Public Meeting Presentation
South Suburban Master Plan Public Meeting PresentationSouth Suburban Master Plan Public Meeting Presentation
South Suburban Master Plan Public Meeting Presentation
 
Kerns usgbc south_fl
Kerns usgbc south_flKerns usgbc south_fl
Kerns usgbc south_fl
 
Portfolio Protogeros Nikolaos
Portfolio Protogeros NikolaosPortfolio Protogeros Nikolaos
Portfolio Protogeros Nikolaos
 
Manhattan Kansas Bicycle Master Plan Revision
Manhattan Kansas Bicycle Master Plan RevisionManhattan Kansas Bicycle Master Plan Revision
Manhattan Kansas Bicycle Master Plan Revision
 
Dlr GI Strategy 2015-2022_final_medres_recvd_20141203
Dlr GI Strategy 2015-2022_final_medres_recvd_20141203Dlr GI Strategy 2015-2022_final_medres_recvd_20141203
Dlr GI Strategy 2015-2022_final_medres_recvd_20141203
 
Critical assesment of the Sustainable Urban Drainage component of the Church ...
Critical assesment of the Sustainable Urban Drainage component of the Church ...Critical assesment of the Sustainable Urban Drainage component of the Church ...
Critical assesment of the Sustainable Urban Drainage component of the Church ...
 

Similar to Creating an Urban Legend: A System for Electrophysiology Data Management and Exploration

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behaviorJames Howison
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research dataVarsha Khodiyar
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelKrzysztof Gorgolewski
 
Teaching Case Studies
Teaching Case StudiesTeaching Case Studies
Teaching Case StudiesJulie Goldman
 

Similar to Creating an Urban Legend: A System for Electrophysiology Data Management and Exploration (20)

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Sabina Leonelli
Sabina LeonelliSabina Leonelli
Sabina Leonelli
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next level
 
Teaching Case Studies
Teaching Case StudiesTeaching Case Studies
Teaching Case Studies
 

More from Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 

More from Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Creating an Urban Legend: A System for Electrophysiology Data Management and Exploration

  • 1. Creating an Urban Legend: A System for Electrophysiology Data Management and Exploration Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com
  • 2. Outline: • Life is complicated • A small pilot • Context and next steps
  • 3. Life is complicated! 1. Interspecies variability > A specimen is not a species! 2. Gene expression variability > Knowing genes is not knowing how they are expressed! 3. Microbiome > An animal is an ecosystem! 4. Systems biology > Whole is more than the sum of its parts! 5. Models vs. experiment > Are we talking about the same things? In a way we can all use? 6. Dynamics > Life is not in equilibrium! => Reductionism doesn’t work for living systems! http://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
  • 4. Statistics could help… With enough observations, trends and anomalies can be detected: • “Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.” The Human Microbiome Project Consortium, Structure, function and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234 • “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.” Nidhi Subbaraman, Nature News, 28 November 2012, High-resolution sequencing study emphasizes importance of rare variants in disease.
  • 5. …but biological research is insular. • Biology is small: size 10^-5 – 10^2 m, scientist can work alone (‘King’ and ‘subjects’). • Biology is messy: it doesn’t happen behind a terminal. • Biology is competitive: many Ponder people with similar skill sets, Communicate vying for the same grants • In summary: the structure of biological research does not inherently promote collaboration (vs., for instance, HE physics or astronomy (and they’re not all they’re cracked up to be, either…)). Prepare Observe Analyze
  • 6. What if we could connect experiments? Across labs, experiments: track reagents and how they are used Observations Observations Observations Prepare Prepare Analyze Communicate Analyze Communicate
  • 7. What if we could connect experiments? Compare outcome of interactions with these entities Observations Observations Observations Prepare Prepare Analyze Communicate Analyze Communicate
  • 8. What if we could connect experiments? Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments Observations Think Observations Observations Prepare Prepare Analyze Communicate Reason collectively! Communicate Analyze
  • 9. Research Data Management today: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  • 10. An Urban Legend is born: • How can we make a standard neuroscience wet lab more data-sharing savvy? • Incorporate structured workflows into the daily practice of a typical electrophysiology lab (the Urban Lab at CMU) – What does it take? – Where are points of conflict? • 1-year pilot, funded by Elsevier RDS: – CMU: Shreejoy Tripathy, manage/user test – Elsevier: development, UI, project management
  • 11. Goal: Enable Effective data sharing: • Effective data sharing = “someone who is not the person who collected the data can understand the experiment and data” (Shreejoy’s definition) – So datasets should be more or less self-describing – > 90% of data sharing use cases are an experimentalist sharing data with a future version of herself or with a labmate • Not just experimental data file, but also the experimental metadata: – What was done? What does this variable mean? – This is usually stored in paper lab notebooks, understandable by only the experimenter
  • 12. Main Assumptions: 1. Effective data sharing includes raw data files + experimental metadata (typically stored in a lab notebook) 2. You know most about an experiment while you’re performing it 3. Improved data practices can make labs more productive and more creative SDB_MC_12_voltages.mat
  • 15. Data integration: • Syncing of metadata app and electrophysiology data acquisition via server • Each trace of experimental data annotated with metadata • IGOR-Pro specific, support pClamp, other acquisition packages as needed later
  • 17. Semantic Integration: Entity tables uses a scope and an attributes field to create a NoSQL like, hierarchical key/value structure in PostgreSQL with the built-in hstore extension. Ontology Information (in normalized sql tables) map keys, values & scopes to ontology information. Entity ID : UUID Investigator : references investigators table created : timestamp last_modified : timestamp scope : string ~ /[A-Z]d+(::[AZ]d+)*/ attributes : hstore (string → string mapping)
  • 18. Data dashboard (planned): • Use collected metadata to sort experiments: organize by mouse strain, neuron type, animal age • Enable in-browser analyses: track provenance of analyzed data back to raw data: “what was that outlier?” • Simple link in to publishing/data sharing tools: “we can publish papers no one else can”
  • 19. Next steps Urban Legend Project: • Populate data server with many experiments: – Are people using it? Why/why not? – What questions can we answer now that we couldn’t before? • Export data to neuroscience databases: NIF, INCF Dataspace, neuroelectro.org • How adaptable is this solution for use in other labs? • Can we scale this up and make it sustainable? • Software is available! Ready to swap this simple system for something better: point is process! • How does it fit into a larger data infrastructure within the institution/nationally/internationally?
  • 20. Elsevier Research Data Services: • Main goal: make research data optimally available, discoverable and reusable • Collaboration is tailored to partner’s unique needs: – Working with a few domain-specific and institutional repositories and institutions – Aspects where collaboration is needed are discussed – Collaboration plan is drawn up using SLA: agree on time, conditions, etc. • 2013/2014: series of pilots, studies and reports to enable feasibility study: – What are key needs? – Can Elsevier play a role: skillsets, partnerships? – Is there a (transparent) business model for this?
  • 21. Institutional Context: Funding Agencies Performance reporting Institution Library Research Office Usage/Citation reporting Institutional Repository Indexing Integrated Performance Query Usage/Citation reporting Indexing Research Data Repositories Unified Metadata Layer Curation Deposit / Store Indexing Generic Data Storage (such as Dropbox) Electronic Lab Notebooks Integrated Data Search Data Flow Performance Reporting Deposit / Store Indexing & Search Researchers
  • 22. Data Initiatives: • Data Citation group: – Synthesize principles of proper data citation – ‘Declaration of Data Citation Principles’, 8 principles of successful data citation -http://www.force11.org/datacitation • Resource Identification Initiative: – Promote research resource identification, discovery, and reuse – Resource Identification Portal http://scicrunch.com/resources – Central location for obtaining research resource identifiers (RRIDs) for materials and software used in biomedical research • Antibody: Abgent Cat# AP7251E, ABR:AB_2140114 • Tool: CellProfiler Image Analysis Software, NIFRegistry:nif-0000-00280 • Organism: MGI:MGI:3840442
  • 23. Summary: • Life is complicated: knowledge needs to be connected! • A small pilot: “Urban Legend” • Context and next steps: – Working with institutions and databases to piece together this puzzle – Force11 is contributing some pieces
  • 24. Thank you! Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick Gerkin, • Santosh Chandrasekaran, Matthew Geramita, Eduard Hovy • UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky • NIF/Force11: Maryann Martone, Anita Bandrowski • OHSU: Melissa Haendel, Nicole Vasilevsky • California Digital Library: Carly Strasser, John Kunze, Stephen Abrams • Elsevier: Mark Harviston, Jez Alder, David Marques
  • 25. Questions? Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/
  • 26. Scopes Follows the format L#::L#::L#... where L is a letter identifier and # is any number of decimal digits. Example: P1::S1::R3 = Animal Prep 1, Slice 1, Run 3 The Letter need not be globally unique but only chain unique. Example: P1::S1::E1(Electrode) is different from P1::S1::R1::E1 (Run-Electrode) Scopes are 1 indexed.
  • 27. Attributes Each scope has an attributes field that consists of multiple key, value pairs. The keys are unique and not tied to scope. (e.g. electrode_name instead of name). Keys can be a choice, scalar (with units), or freetext field and which is determined by the ontology tables.
  • 28. Downsides to Flexible Schema Converting to/from the flat scopes to a true hierarchy (say in JSON) is rather complicated and led to many errors in the App. Very easy to get corrupted data in the App. Schema is closely aligned to the way the lua App did things. A flexible schema was a good choice, but not scopes for hierarchies.
  • 29. Raw Data For use in data-dashboard. Standardized on HDF5. Files uploaded via FTP. Username, filename, and metadata w/i the HDF5 file used to identify associated metadata records. Batch or individually uploaded.

Editor's Notes

  1. Walk through pieces 1 by 1, also mention that this is very much an uncompleted work in progress