SlideShare une entreprise Scribd logo
1  sur  87
The Culture of Research Data
Peter Murray-Rust,
ContentMine.org and UniversityOfCambridge
LEARN, London, UK 2016-01-29
The technology for Managing Research Data is already here…
…but we need a change of culture
Open Notebook Science
Publishers must be forced to serve us, not control us
Just read the big
letters
He’s got zillions of
slides…
My European Heroes
Young People(ContentMine)
NEELIE KROES
The Right to Read is the Right to Mine
http://contentmine.org
Themes
• Highly domain-dependent (chem, cryst, phylo)
• Requires community and centrality
• University repositories are NOT the solution
• Openness makes it dramatically easier/better
• The publisher-academic complex is a major
problem.
• Infrastructure must be open and under our
control
WE pay for scholarly
publications that WE
can’t read
[1] The Military-Industrial-Academic complex (1961)
(Dwight D Eisenhower, US President)
Publishers Academia
Glory+?
$$, MS
review
Taxpayer
Student
Researcher
$$ $$
in-kind
The Publisher-Academic complex[1]
Elsevier wants to control Open Data
[asked by Michelle Brook]
Some topics
• Github / software mgt informs data mgt
• Open notebook science
• Open source malaria + LabTrove
• Open phylogenetics
• Computational chemistry
• Crystallography
• Early career researchers can change the world, if
we support them.
• ContentMining (TDM) as research
• Are “publishers” tyrants or servants?
Every Research
Data Manager
should be using Git
Why I reposit software in GitHub
I WANT TO!!!
BETTER
QUICKER
SECURE
AUDIT
BACKTRACKABLE
EASY
get collaborators
Most early career software creators have repos
How many people have USED Git?
Free/Open Software Development
CODE
REPOSITORY
World
community
CODE
rewrite
validate
CODE
fork
CODE
Re-use
CODE
Re-use
Github, BitBucket
StackOverflow,
Apache
inspires
OSI
Example: ContentMine at
http://github.com/ContentMine/quickscrape
BORN-OPEN-SOURCE
NO WALLS
GIT housekeeps AUTOMATICALLY, eternally
Daily record of commits and
Merges. Can backtrack to ANY
Previous version
Community involvement
https://github.com/ContentMine/quickscrape/pulls
Contributions from
People “outside project”
Compile Fail
Inactive
Fail Tests
Pass Tests
Continuous Integration (Jenkins)
Every time I commit a change
50 projects are recompiled
and tested.
Impossible to do this manually!
Software management
Is a success!
Research DATA
management
Is a mess.
Traditional Research and Publication
“Lab” work paper/th
esis
Write
rewrite
Re-experiment
publish
???
Validation??
DATA
output “belongs”
to publisher
Every process is LOSSY
How NOT to publish data
HT Henry Rzepa
From Henry Rzepa:
this article http://doi.org/10.1126/science.aad6252
which provides a 22 Mbyte PDF of data (mostly bitmaps of NMR
spectra) and comes in at 404 pages long. [1]
But this one http://doi.org/10.1021/jacs.5b05902 [comp chem]
is 505 pages long (the current record holder?)
[1] DATA Behind paywall
505 pages PDF, was a
machine-readable log file
that could and
should have been in a repo
Computational
Chemistry
MORE of the PDF
DATA Destruction
Blind humans and
Machines cannot
read this
ALWAYS put your
(computational,
instrumental,
observational)
data directly into a
repository
some visionaries…
JD Bernal’s 1965 vision
However large an array of facts, however rapidly they
accumulate, it is possible to keep them in order and to
extract from time to time digests containing the most
generally significant information, while indicating how to
find those items of specialized interest. To do so, however,
requires the will and the means. (Bernal, 1965)
Quoted by PMR in http://journals.iucr.org/d/issues/1998/06/01/ba0011/ba0011.pdf
PMR’s Tribute
Planned Memorial Meeting
July 14th 2014 Cambridge
OPEN NOTEBOOK SCIENCE
https://en.wikipedia.org/wiki/Bermuda_Principles
• Automatic release of sequence assemblies larger than 1
kb (preferably within 24 hours).
• Immediate publication of finished annotated
sequences.
• Aim to make the entire sequence freely available in the
public domain for both research and development in
order to maximise benefits to society.
HUMAN GENOME project used
Open Notebooks
Without
Open is FASTER, BETTER,
MORE, MORE EFFICIENT
Open Notebook Science, ONS
Jean-Claude Bradley 2006
All data immediately
available to all. NO
INSIDER INFORMATION.
TOOLS
Open Notebook Science
Open
engineered
repository
World
community
INSTRUMENT
validate
merge
MODEL
CODE
DATA
DATA
knowledge
calibrate
Problems are solved communally;
Nothing is needlessly duplicated; “publication“ is
continuous ; data are SEMANTIC
Machines
and humans
Working
together
Here are three
examples
Mat Todd (Sydney) and MANY collaborators
http://opensourcemalaria.org/ (Chrome for interactivity)
Mat Todd, Univ Sydney, runs an Open Notebook community
to create new antimalarials.
Notebook managed on Git.
Interactive OPEN chemical search tool from cheminfo.org
Interactive OPEN molecular display Jmol (Bob Hanson et al)
Interactive OPEN chemical search tool from cheminfo.org
data is associated with the proposed
scientific endeavour prior to or at the
point of creation rather than by
annotating the data with commentary
after the experiment has taken place
University of Southampton
Data thrives on Community
Henry Rzepa does Open
Notebook Computational
Chemistry…
http://www.rzepa.net/blog/?p=14272
This is a current open notebook discussion,
http://www.ch.imperial.ac.uk/rzepa/blog/?p=15552 (see comments,
currently 67).
… on his blog
COMMUNITY
INVOLVEMENT
Crystallography – a model for Data
Management
• Pro-active, friendly international community
• Committed active International Union(IUCr)
• Data publication valued (1960-present)
• Community develops semantics/dictionaries
• Committed volunteer software innovators
• Heavily Open approach
• Massive and valuable re-use of data
• Culture of validation/reproducibility
• Respect and credit for tool development
IUCr DICTIONARIES
IUCr VALIDATION
CRITERIA/TOOLS
DATA
PUBLICLY
VALIDATED
TRUSTABLE
SCIENCE
Where to reposit published
crystallography?
Proteins -> PDB, Open
BUT
Inorganics -> ICSD Closed
Organics -> Cambridge (CCDC) Closed
SO
The community has built a Crystallography Open
Database
Restrictions on Re-use of Crystallographic data
NOTE: The CCDC is based on data contributed by
scientists as part of publication and validation
Crystallographic data from
publications now belongs to CCDC
Open Source and Open Data
www.crystallography.net
Interactive OPEN crystal search tool
Panton Fellows (Early Career Researchers)
Panton Principles of Open Scientific Data 2010
Publish data openly
(CC0) and record
your wishes
Sophie Kershaw, Panton Fellow :
Doctoral Training in Oxford
Sophie Kershaw, Panton Fellow
Rotation-Based Learning (RBL)
Phase 1: Initiator
• No communication
permitted between groups
• Attempt to reproduce
existing literature
• Deliver a coherent research
story by the end of Phase 1
Phase 2: Successor
• Communication between
groups still prohibited
• Validate and develop the
inherited research story
• Critique your predecessors
• Role of research producer vs. research user
• Can this approach help to foster awareness of reproducibility issues?
Throughout Phases 1 & 2:
• Daily lectures on open
science culture & techniques
• First-hand application to own
research work
• Version control using GitHub
• Daily group supervision
… third-year graduate
students
So first-year grad
students should be
trained by…
So we can now
legally contentmine
the whole literature
in the UK…
NORMA
Ross Mounce and PMR
created a SuperTree of Life
for microorganisms!
…Yes! And in UK
we are starting
to do it…
http://www.slideshare.net/rossmounce/the-pluto-project-ievobio-2014
https://en.wikipedia.org/wiki/Tree_of_life CC BY-SA
Aves
Apterygidae
Marsupialia
Monotremata
Mammalia
Reptilia
Amphibia
Arthropoda
Myriapodia
Okapia johnstoni
Pyrus
Stuffed Tree of Life
Authors don’t deposit data (Ross Mounce)
And we did it as Open Notebook
Science
all data and code on Github
Discussion on public Discourse Tool
NO INSIDER KNOWLEDGE
4300 images in Github
“Root”
We analysed every pixel
Many diagrams had author errors
Supertree created from 4300 papers
Supertree for 924 species
Tree
So why not Git for Data?
DAT is Git for Data!!
DAT! Queen Mary UL reposits DNA
The John S. and James L. Knight Foundation is an American private, non-profit foundation
dedicated to supporting "transformational ideas that promote quality journalism, advance
media innovation, engage communities and foster the arts."[2]
DAT supports public data
@Senficon (Julia Reda) :Text & Data mining in times of
#copyright maximalism:
"Elsevier stopped me doing my research"
http://onsnetwork.org/chartgerink/2015/11/16/elsevi
er-stopped-me-doing-my-research/ … #opencon #TDM
Elsevier stopped me doing my research
Chris Hartgerink
I am a statistician interested in detecting potentially problematic research such as data fabrication,
which results in unreliable findings and can harm policy-making, confound funding decisions, and
hampers research progress.
To this end, I am content mining results reported in the psychology literature. Content mining the
literature is a valuable avenue of investigating research questions with innovative methods. For
example, our research group has written an automated program to mine research papers for errors in
the reported results and found that 1/8 papers (of 30,000) contains at least one result that could
directly influence the substantive conclusion [1].
In new research, I am trying to extract test results, figures, tables, and other information reported in
papers throughout the majority of the psychology literature. As such, I need the research papers
published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research
papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account
potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention
to redistribute the downloaded materials, had legal access to them because my university pays a
subscription, and I only wanted to extract facts from these papers.
Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days.
This boils down to a server load of 0.0021GB/[min], 0.125GB/h, 3GB/day.
Approximately two weeks after I started downloading psychology research papers, Elsevier notified my
university that this was a violation of the access contract, that this could be considered stealing of
content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading
(which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.
I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly
hampering me in my research.
[1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The
prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22.
doi: 10.3758/s13428-015-0664-2
Chris Hartgerink’s blog post
Some Children
of the Digital Enlightenment
• David Carroll & Joe McArthur: OAButton
• Rayna Stamboliyska & Pierre-Carl Langlais
• Jon Tennant
• Ross Mounce
• Jenny Molloy
• Erin McKiernan
• Jack Andraka
• Michelle Brook
• Heather Piwowar
• TheContentMine Team
• Rufus Pollock
• Jonathan Gray
• Sophie Kay
Jean-Claude Bradley [1] a chemist
developed Open notebook science;
making the entire primary record of a
research project publicly available
online as it is recorded. (WP)
J-C promoted these ideas with
UNDERGRADUATE scientists.
[1] Unfortunately J-C died in 2014;
we held a memorial meeting in
Cambridge
Sophie
Kay
Unused slides…
OPEN CLOSED
Zenodo Figshare
Git
Dat
OpenOffice Word, PPT
LabTrove, cheminfo.org Chemdraw
CrystallographyOpenDB Cambridge Cryst data Centre
WriteLatex / Overleaf
ReadCube, Symplectic,
This is a current open notebook discussion, http://www.ch.imperial.ac.uk/rzepa/blog/?p=15552
(see comments, currently 67).
This is an earlier one, http://www.rzepa.net/blog/?p=14272 (with 86 comments) and also
incorporates Jsmol to visualise all the data
This one starts discussion as an open notebook http://www.ch.imperial.ac.uk/rzepa/blog/?p=1211
with the resulting formal publication at 10.1002/jcc.23985
This was the original open notebook post http://www.ch.imperial.ac.uk/rzepa/blog/?p=984 with
the resulting formal publication at 10.1038/NCHEM.596
This one incorporates open data into its citation list
http://www.ch.imperial.ac.uk/rzepa/blog/?p=15505 and is also an open notebook follow up to my
PhD thesis work, formally published in 1975 or so, thus operating in reverse to the above.
This shows some end outcomes: http://www.ch.imperial.ac.uk/rzepa/blog/?p=15313
This shows the principles: http://www.ch.imperial.ac.uk/rzepa/blog/?p=10972
This is an introductory tutorial http://www.ch.imperial.ac.uk/rzepa/blog/?p=14454
This is a critique http://www.ch.imperial.ac.uk/rzepa/blog/?p=13826
This is “convincing case” http://www.ch.imperial.ac.uk/rzepa/blog/?p=13248
This is about metadata http://www.ch.imperial.ac.uk/rzepa/blog/?p=12932
And its use http://www.ch.imperial.ac.uk/rzepa/blog/?p=12526
You have seen this data nightmare before: http://www.ch.imperial.ac.uk/rzepa/blog/?p=12728
This is about ORCID http://www.ch.imperial.ac.uk/rzepa/blog/?p=12513
Open Source software inspires Open Science
Jean-Claude Bradley 2006
Ross Mounce (Bath), Panton Fellow
• Sharing research data:
http://www.slideshare.net/rossmounce
• How-to figures from PLOS/One [link]:
Ross shows how to bring figures to life:
• PLOSOne at http://bit.ly/PLOStrees
• PLOS at http://bit.ly/phylofigs (demo)
The culture of researchData

Contenu connexe

Tendances

Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
The "social" side of digital science
The "social" side of digital scienceThe "social" side of digital science
The "social" side of digital scienceKaitlin Thaney
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open ScienceTheContentMine
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humanspetermurrayrust
 
Ethnography for impact: a new way of exploring user experience in libraries
Ethnography for impact: a new way of exploring user experience in librariesEthnography for impact: a new way of exploring user experience in libraries
Ethnography for impact: a new way of exploring user experience in librariesAndy Priestner
 
Uksg Social Science
Uksg Social ScienceUksg Social Science
Uksg Social ScienceTony Hirst
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open researchheila1
 
Open Sesame (and other open movements)
Open Sesame (and other open movements)Open Sesame (and other open movements)
Open Sesame (and other open movements)Dorothea Salo
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyonepetermurrayrust
 
5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011 5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011 Elizabeth Brown
 
General presentation of the LiquidPub project
General presentation of the LiquidPub projectGeneral presentation of the LiquidPub project
General presentation of the LiquidPub projectAliaksandr Birukou
 
Web 2.0 for Biologists–Are any of the current tools worth using?
Web 2.0 for Biologists–Are any of the current tools worth using?Web 2.0 for Biologists–Are any of the current tools worth using?
Web 2.0 for Biologists–Are any of the current tools worth using?dacrotty
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak openLilian Juma
 
Vks Presentation, Jankowski,15 Jan2009, Websites & Books, Near Final
Vks Presentation, Jankowski,15 Jan2009, Websites & Books, Near FinalVks Presentation, Jankowski,15 Jan2009, Websites & Books, Near Final
Vks Presentation, Jankowski,15 Jan2009, Websites & Books, Near FinalNick Jankowski
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Aaron Sloman
 
Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...
Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...
Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...CARLsurvey2010
 

Tendances (20)

Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
The "social" side of digital science
The "social" side of digital scienceThe "social" side of digital science
The "social" side of digital science
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Ethnography for impact: a new way of exploring user experience in libraries
Ethnography for impact: a new way of exploring user experience in librariesEthnography for impact: a new way of exploring user experience in libraries
Ethnography for impact: a new way of exploring user experience in libraries
 
Uksg Social Science
Uksg Social ScienceUksg Social Science
Uksg Social Science
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open research
 
Open Sesame (and other open movements)
Open Sesame (and other open movements)Open Sesame (and other open movements)
Open Sesame (and other open movements)
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 
5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011 5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
Plosslides
PlosslidesPlosslides
Plosslides
 
PLOS slides
PLOS slidesPLOS slides
PLOS slides
 
General presentation of the LiquidPub project
General presentation of the LiquidPub projectGeneral presentation of the LiquidPub project
General presentation of the LiquidPub project
 
Web 2.0 for Biologists–Are any of the current tools worth using?
Web 2.0 for Biologists–Are any of the current tools worth using?Web 2.0 for Biologists–Are any of the current tools worth using?
Web 2.0 for Biologists–Are any of the current tools worth using?
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak open
 
Vks Presentation, Jankowski,15 Jan2009, Websites & Books, Near Final
Vks Presentation, Jankowski,15 Jan2009, Websites & Books, Near FinalVks Presentation, Jankowski,15 Jan2009, Websites & Books, Near Final
Vks Presentation, Jankowski,15 Jan2009, Websites & Books, Near Final
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
 
Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...
Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...
Using Social Media in Canadian Academic Libraries: A 2010 CARL ABRC Libraries...
 

En vedette

ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
ContentMine Architecture
ContentMine ArchitectureContentMine Architecture
ContentMine Architecturepetermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literaturepetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technologypetermurrayrust
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesespetermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 

En vedette (15)

Csvconf
CsvconfCsvconf
Csvconf
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine Architecture
ContentMine ArchitectureContentMine Architecture
ContentMine Architecture
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literature
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technology
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 

Similaire à The culture of researchData

The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literaturepetermurrayrust
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019heila1
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Librariespetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS TheContentMine
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 

Similaire à The culture of researchData (20)

The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
Ngsp
NgspNgsp
Ngsp
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 

Plus de petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017petermurrayrust
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistrypetermurrayrust
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literaturepetermurrayrust
 

Plus de petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 

Dernier

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Dernier (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

The culture of researchData

  • 1. The Culture of Research Data Peter Murray-Rust, ContentMine.org and UniversityOfCambridge LEARN, London, UK 2016-01-29 The technology for Managing Research Data is already here… …but we need a change of culture Open Notebook Science Publishers must be forced to serve us, not control us
  • 2. Just read the big letters He’s got zillions of slides…
  • 3. My European Heroes Young People(ContentMine) NEELIE KROES
  • 4. The Right to Read is the Right to Mine http://contentmine.org
  • 5. Themes • Highly domain-dependent (chem, cryst, phylo) • Requires community and centrality • University repositories are NOT the solution • Openness makes it dramatically easier/better • The publisher-academic complex is a major problem. • Infrastructure must be open and under our control
  • 6. WE pay for scholarly publications that WE can’t read [1] The Military-Industrial-Academic complex (1961) (Dwight D Eisenhower, US President) Publishers Academia Glory+? $$, MS review Taxpayer Student Researcher $$ $$ in-kind The Publisher-Academic complex[1]
  • 7. Elsevier wants to control Open Data [asked by Michelle Brook]
  • 8. Some topics • Github / software mgt informs data mgt • Open notebook science • Open source malaria + LabTrove • Open phylogenetics • Computational chemistry • Crystallography • Early career researchers can change the world, if we support them. • ContentMining (TDM) as research • Are “publishers” tyrants or servants?
  • 10. Why I reposit software in GitHub I WANT TO!!! BETTER QUICKER SECURE AUDIT BACKTRACKABLE EASY get collaborators Most early career software creators have repos How many people have USED Git?
  • 11. Free/Open Software Development CODE REPOSITORY World community CODE rewrite validate CODE fork CODE Re-use CODE Re-use Github, BitBucket StackOverflow, Apache inspires OSI Example: ContentMine at http://github.com/ContentMine/quickscrape BORN-OPEN-SOURCE NO WALLS
  • 12. GIT housekeeps AUTOMATICALLY, eternally Daily record of commits and Merges. Can backtrack to ANY Previous version
  • 14. Compile Fail Inactive Fail Tests Pass Tests Continuous Integration (Jenkins) Every time I commit a change 50 projects are recompiled and tested. Impossible to do this manually!
  • 15. Software management Is a success! Research DATA management Is a mess.
  • 16. Traditional Research and Publication “Lab” work paper/th esis Write rewrite Re-experiment publish ??? Validation?? DATA output “belongs” to publisher Every process is LOSSY
  • 17. How NOT to publish data HT Henry Rzepa From Henry Rzepa: this article http://doi.org/10.1126/science.aad6252 which provides a 22 Mbyte PDF of data (mostly bitmaps of NMR spectra) and comes in at 404 pages long. [1] But this one http://doi.org/10.1021/jacs.5b05902 [comp chem] is 505 pages long (the current record holder?) [1] DATA Behind paywall
  • 18. 505 pages PDF, was a machine-readable log file that could and should have been in a repo Computational Chemistry
  • 19. MORE of the PDF DATA Destruction Blind humans and Machines cannot read this
  • 22. JD Bernal’s 1965 vision However large an array of facts, however rapidly they accumulate, it is possible to keep them in order and to extract from time to time digests containing the most generally significant information, while indicating how to find those items of specialized interest. To do so, however, requires the will and the means. (Bernal, 1965) Quoted by PMR in http://journals.iucr.org/d/issues/1998/06/01/ba0011/ba0011.pdf
  • 23. PMR’s Tribute Planned Memorial Meeting July 14th 2014 Cambridge OPEN NOTEBOOK SCIENCE
  • 24. https://en.wikipedia.org/wiki/Bermuda_Principles • Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). • Immediate publication of finished annotated sequences. • Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society. HUMAN GENOME project used Open Notebooks Without
  • 25. Open is FASTER, BETTER, MORE, MORE EFFICIENT
  • 26. Open Notebook Science, ONS Jean-Claude Bradley 2006 All data immediately available to all. NO INSIDER INFORMATION.
  • 27. TOOLS Open Notebook Science Open engineered repository World community INSTRUMENT validate merge MODEL CODE DATA DATA knowledge calibrate Problems are solved communally; Nothing is needlessly duplicated; “publication“ is continuous ; data are SEMANTIC Machines and humans Working together
  • 29. Mat Todd (Sydney) and MANY collaborators http://opensourcemalaria.org/ (Chrome for interactivity) Mat Todd, Univ Sydney, runs an Open Notebook community to create new antimalarials.
  • 31. Interactive OPEN chemical search tool from cheminfo.org
  • 32. Interactive OPEN molecular display Jmol (Bob Hanson et al)
  • 33.
  • 34. Interactive OPEN chemical search tool from cheminfo.org
  • 35. data is associated with the proposed scientific endeavour prior to or at the point of creation rather than by annotating the data with commentary after the experiment has taken place University of Southampton
  • 36. Data thrives on Community
  • 37. Henry Rzepa does Open Notebook Computational Chemistry… http://www.rzepa.net/blog/?p=14272 This is a current open notebook discussion, http://www.ch.imperial.ac.uk/rzepa/blog/?p=15552 (see comments, currently 67). … on his blog
  • 38.
  • 39.
  • 41. Crystallography – a model for Data Management • Pro-active, friendly international community • Committed active International Union(IUCr) • Data publication valued (1960-present) • Community develops semantics/dictionaries • Committed volunteer software innovators • Heavily Open approach • Massive and valuable re-use of data • Culture of validation/reproducibility • Respect and credit for tool development
  • 44. DATA
  • 46. Where to reposit published crystallography? Proteins -> PDB, Open BUT Inorganics -> ICSD Closed Organics -> Cambridge (CCDC) Closed SO The community has built a Crystallography Open Database
  • 47. Restrictions on Re-use of Crystallographic data NOTE: The CCDC is based on data contributed by scientists as part of publication and validation Crystallographic data from publications now belongs to CCDC
  • 48. Open Source and Open Data www.crystallography.net
  • 50.
  • 51. Panton Fellows (Early Career Researchers) Panton Principles of Open Scientific Data 2010 Publish data openly (CC0) and record your wishes
  • 52. Sophie Kershaw, Panton Fellow : Doctoral Training in Oxford
  • 54. Rotation-Based Learning (RBL) Phase 1: Initiator • No communication permitted between groups • Attempt to reproduce existing literature • Deliver a coherent research story by the end of Phase 1 Phase 2: Successor • Communication between groups still prohibited • Validate and develop the inherited research story • Critique your predecessors • Role of research producer vs. research user • Can this approach help to foster awareness of reproducibility issues? Throughout Phases 1 & 2: • Daily lectures on open science culture & techniques • First-hand application to own research work • Version control using GitHub • Daily group supervision
  • 55. … third-year graduate students So first-year grad students should be trained by…
  • 56. So we can now legally contentmine the whole literature in the UK… NORMA Ross Mounce and PMR created a SuperTree of Life for microorganisms! …Yes! And in UK we are starting to do it…
  • 60. Authors don’t deposit data (Ross Mounce)
  • 61. And we did it as Open Notebook Science all data and code on Github Discussion on public Discourse Tool NO INSIDER KNOWLEDGE
  • 62. 4300 images in Github
  • 64. Many diagrams had author errors
  • 65. Supertree created from 4300 papers
  • 66. Supertree for 924 species Tree
  • 67. So why not Git for Data?
  • 68. DAT is Git for Data!!
  • 69. DAT! Queen Mary UL reposits DNA
  • 70. The John S. and James L. Knight Foundation is an American private, non-profit foundation dedicated to supporting "transformational ideas that promote quality journalism, advance media innovation, engage communities and foster the arts."[2] DAT supports public data
  • 71. @Senficon (Julia Reda) :Text & Data mining in times of #copyright maximalism: "Elsevier stopped me doing my research" http://onsnetwork.org/chartgerink/2015/11/16/elsevi er-stopped-me-doing-my-research/ … #opencon #TDM Elsevier stopped me doing my research Chris Hartgerink
  • 72. I am a statistician interested in detecting potentially problematic research such as data fabrication, which results in unreliable findings and can harm policy-making, confound funding decisions, and hampers research progress. To this end, I am content mining results reported in the psychology literature. Content mining the literature is a valuable avenue of investigating research questions with innovative methods. For example, our research group has written an automated program to mine research papers for errors in the reported results and found that 1/8 papers (of 30,000) contains at least one result that could directly influence the substantive conclusion [1]. In new research, I am trying to extract test results, figures, tables, and other information reported in papers throughout the majority of the psychology literature. As such, I need the research papers published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention to redistribute the downloaded materials, had legal access to them because my university pays a subscription, and I only wanted to extract facts from these papers. Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days. This boils down to a server load of 0.0021GB/[min], 0.125GB/h, 3GB/day. Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university. I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly hampering me in my research. [1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22. doi: 10.3758/s13428-015-0664-2 Chris Hartgerink’s blog post
  • 73. Some Children of the Digital Enlightenment • David Carroll & Joe McArthur: OAButton • Rayna Stamboliyska & Pierre-Carl Langlais • Jon Tennant • Ross Mounce • Jenny Molloy • Erin McKiernan • Jack Andraka • Michelle Brook • Heather Piwowar • TheContentMine Team • Rufus Pollock • Jonathan Gray • Sophie Kay Jean-Claude Bradley [1] a chemist developed Open notebook science; making the entire primary record of a research project publicly available online as it is recorded. (WP) J-C promoted these ideas with UNDERGRADUATE scientists. [1] Unfortunately J-C died in 2014; we held a memorial meeting in Cambridge Sophie Kay
  • 75.
  • 76.
  • 77. OPEN CLOSED Zenodo Figshare Git Dat OpenOffice Word, PPT LabTrove, cheminfo.org Chemdraw CrystallographyOpenDB Cambridge Cryst data Centre WriteLatex / Overleaf ReadCube, Symplectic,
  • 78. This is a current open notebook discussion, http://www.ch.imperial.ac.uk/rzepa/blog/?p=15552 (see comments, currently 67). This is an earlier one, http://www.rzepa.net/blog/?p=14272 (with 86 comments) and also incorporates Jsmol to visualise all the data This one starts discussion as an open notebook http://www.ch.imperial.ac.uk/rzepa/blog/?p=1211 with the resulting formal publication at 10.1002/jcc.23985 This was the original open notebook post http://www.ch.imperial.ac.uk/rzepa/blog/?p=984 with the resulting formal publication at 10.1038/NCHEM.596 This one incorporates open data into its citation list http://www.ch.imperial.ac.uk/rzepa/blog/?p=15505 and is also an open notebook follow up to my PhD thesis work, formally published in 1975 or so, thus operating in reverse to the above. This shows some end outcomes: http://www.ch.imperial.ac.uk/rzepa/blog/?p=15313 This shows the principles: http://www.ch.imperial.ac.uk/rzepa/blog/?p=10972 This is an introductory tutorial http://www.ch.imperial.ac.uk/rzepa/blog/?p=14454 This is a critique http://www.ch.imperial.ac.uk/rzepa/blog/?p=13826 This is “convincing case” http://www.ch.imperial.ac.uk/rzepa/blog/?p=13248 This is about metadata http://www.ch.imperial.ac.uk/rzepa/blog/?p=12932 And its use http://www.ch.imperial.ac.uk/rzepa/blog/?p=12526 You have seen this data nightmare before: http://www.ch.imperial.ac.uk/rzepa/blog/?p=12728 This is about ORCID http://www.ch.imperial.ac.uk/rzepa/blog/?p=12513
  • 79.
  • 80. Open Source software inspires Open Science Jean-Claude Bradley 2006
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86. Ross Mounce (Bath), Panton Fellow • Sharing research data: http://www.slideshare.net/rossmounce • How-to figures from PLOS/One [link]: Ross shows how to bring figures to life: • PLOSOne at http://bit.ly/PLOStrees • PLOS at http://bit.ly/phylofigs (demo)

Notes de l'éditeur

  1. Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.