Cerys Willoughby, University of Southhampton
Jeremy Frey, Andrew Milsted, Simon Coles, Colin Bird, Cerys Willoughby, Cameron Neylon and Matthew Todd: “Towards a global open scientific notebook infrastructure”
Panel: Global scientific data infrastructure
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
RDAP13 Cerys Willoughby: Towards a global open scientific notebook infrastructure
1. Towards a global open scientific
notebook infrastructure
Jeremy Frey, Andrew Milsted,
Simon Coles, Colin Bird,
Cerys Willoughby, Cameron Neylon &
Matthew Todd
2. Science is
Science is
increasingly
increasingly
interdisciplinary
interdisciplinary
4. Comparison with
Comparison with
traditional paper
traditional paper
notebooks
notebooks
•• Higher Quality Record
Higher Quality Record
•• Natural linking to data and external
Natural linking to data and external
resources
Electronic
Electronic resources
•• Easier Collaboration
Easier Collaboration
Laboratory
Laboratory •• Improved planning
Improved planning
Notebooks
Notebooks •• Improved discussions
Improved discussions
•• Efficiency gain in production of
Efficiency gain in production of
presentations/reports
presentations/reports
ELNs
ELNs •• Change the nature of
Change the nature of
Communication
Communication Professor/Student interactions
Professor/Student interactions
Collaboration
Collaboration
Sharing
Sharing
Linking
Linking
Curating
Curating
5. Commercial offerings
Commercial offerings
Web 2.0
Web 2.0
Developments in LabTrove
ELN implementation
Smart Tea
and characteristics Semantics
PNNL User focus
Collaboration
RS/1
Trust in ELNs for
IP compliance
1980 1990 2000 2010
7. How do we
If you can't describe what
communicate? you are doing as a process,
you don't know what
• Surprisingly difficult to you're doing.
W. Edwards Deming
explain what a process
involves
• Much of the detail is
assumed to be understood
and not explicitly discussed Growing need for the
global (virtual)
• This is where the mis-
equivalent of the
understandings usually “Tea Room”
arise.
14. Open Notebooks
• Troves can be open Read/Comment/Write
– Can control this access so it is your choice
• All contributions attributable (login needed)
– Anonymous contributions not usually enabled
• Open contribution does worry the IT services
– Provides potential pathway for abuse of systems
– Not just our systems
15. Global open scientific notebook
infrastructure
• Global collaboration:
– International
– Interdisciplinary
• Open science
• To ascend the knowledge pyramid, we need
open collaboration and sharing of results
16. We must speed up the knowledge discovery process
All I am saying is that now is the time to
develop the technology to deflect an asteroid
Notes de l'éditeur
Talk will discuss applications of work originated in Southampton on development of electronic laboratory notebooks to support collaborative investigations and illustrated by work undertaken at Southampton, the ISIS neutron facility (Neylon) and University of Sydney (Todd). Work comes out of the e-Science funding (CombeChem Project) from the UK RCUK (Research Councils UK) [e-Science maps to Cyber-Infrastructure in the USA] further developed by funding from the Universities Modernization Fund, collaborative R&D between chemistry, computer science and library.
Open Access debate has been high profile, but primarily and economic argument, from our perspective the question would be open access to what and we are interested in the access to the data! Thus the role of data management plans. The Royal Society report is key as it stresses that access to the data is essential for the whole basis of science to enable other researchers to build on the published work which is must harder and can be impossible if the data is not available (and easier if freely available) but only if the data is comprehensible so intelligent access is highlighted as necessary (i.e. importance of metadata).
Infrastructure needs to support the collection and curation of data for high quality dissemination with context and provenance. Infrastructure parallels the DIKW Data, Information, Knowledge, Wisdom hierarchy.
Having the ELN leads to changes in behaviour.
Development of the ELNs trade off in effort devoted to Semantics, Usability and IP building these up over time, showing our Smart Tea and LabTrove projects
The LabTrove system – designed to be quite easy to use for open and closed projects, allow & encourage use of metadata but not require or enforce – approach needed for adoption. Open Source software, with hosting and advice services.
Skip this slide – LabTrove was further developed under the SRF project
Process is important! As important as the Data. Need to describe as we can’t all “visit” – global tea room [Chemists are big on tea rooms]
Images important, able to sketch comment as well as text comment, highly linked notes. For example a record (post) about a substrate, can then trace what processes used this substrate and what results were then produced, so if it transpires there was an issue with the material then the consequences can be readily traced.
Computational processes can “blog” as well. A Matlab script can be run from a publish script so that all aspects data, code, figures output are all added to a Trove to give full provenance of a figure/result so a clear reord is kept of what material generated what outputs. Very useful once students have left and figures need modifying for a paper
Comments on computational models – in this case GODIVA is a way to show ocean models over the web (University of Reading) and with LabTrove added people can comment on geo-coded regions of the models results and have the video in the post – metadata taken from the models and put in the Trove.
Just shows the use in the x-ray project… computationally intensive image reconstruction in a complex, multi-disciplinary project, use of timelines, I have this to show that my work is grounded in physical science as well as computer science. You may want to stress your background in usability which is as we know so important to actually making this all work
Examples from USyd of the Open Notebook science use in malaria drugs. Enables global collaboration, link back to notebook from the publications, has industrial participation, links with other platforms (wiki etc). Pictures of the research are really useful.
Social media to disseminate open research, links to Twitter, and perhaps Facebook etc, make sure metadata is good enough for search engines to find, perhaps need some specialist metadata for research findings, researcher and funder ids are certainly useful!
Attribution requires similar infrastructure to security, so switching between Open Notebook Science, Open on Publication, Closed (i.e. industrially funded private research) is not hard:- in industry the work my not be public but often does need to be shared within the company, so similar issues to Open Science apply.
Well more rapidly and more efficiently, but is viewed by many as a problem when it comes to establishing reputation and advancement in career or potential financial gain, but open does not mean free, perhaps free at the point of use, but someone has paid for the work and is paying to maintain the access. Could comment on the collective action of the long tail of laboratory science needs the global collaboration that semantics + the web (not necessarily the formal semantic web) provides.
Attitudes to undertaking research need to change so that when data is collected the assumption is that it will be shared (at some point) and that collaboration is essential for rapid progress – don ’ t wait until it is right before you share at least with your collaborators, something students seem to resist not understanding that share and discuss is the best way to find out what is right.