Professor Carole Goble, University of Manchester, talks at the RIN "Research data: policies & behaviour" event as part of a series on Research Information in Transition.
How to do quick user assign in kanban in Odoo 17 ERP
Data sharing - Data management - The SysMO-SEEK Story
1. Data sharing
Data management
The SysMO-SEEK
Story
Professor Carole Goble FREng FBCS CITP
University of Manchester, UK
carole.goble@manchester.ac.uk
2. 13 teams
91 institutes, 300 scientists
Multi-site, multi-disciplinary
Each three year duration
Data generation
Data consumption
Data analysis
Data management:
Local – Shared – Long term
Pan European
Systems Biology
http://www.sysmo.net
3.
4. Own data solutions. wikis, e-Groupware,
PHProjekt, BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets.
Extreme caution over sharing.
Modellers vs experimentalist tribalism
Many institutions, many projects, overlapping
memberships, changing membership. Projects
ending, starting, carrying on the same, carrying
on differently.
Legacy
Suspicion
Dynamics
Expert scientists, inexpert informaticians. Few
resources.
Skills
Patchy standards, incomparable data,
afterthought.
Data
6. Data mine-ing
“my impression of researchers, and I can
criticize myself in this, is that we’re much
more interested in sharing data when we
mean sharing somebody else’s as opposed
[to] sharing ours.”
E-infrastructure - taking forward the strategy, RIN report, 2010
8. “It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare
it for others, esp. now we finished that project”
“Its faster/easier to do it myself, and will keep the
credit/control too”
“Its not described enough to be usable”
“I don’t trust the quality. Its not reliable enough. Its
too noisy.
“Others won’t use it properly.”
“It’s not worth
my while”“They are my competitors!!”
10. 2. Preparation for Use
Curation
Standards
Reusability
Reproducibility
Accountability & Quality
Data discipline Silo busting
11. CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About a Microarray Experiment
MIAME/Env MIAME / Environmental transcriptomic experiment
MIAME/Nutr MIAME / Nutrigenomics
MIAME/Plant MIAME / Plant transcriptomics
MIAME/Tox MIAME / Toxicogenomics
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About a Simulation Experiment
MIENS Minimum Information about an ENvironmental Sequence
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMIx Minimum Information about a Molecular Interaction Experiment
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINI Minimum Information about a Neuroscience Investigation
MINIMESS Minimal Metagenome Sequence Analysis Standard
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment
MIRIAM Minimal Information Required In the Annotation of biochemical Models
MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
Experiments
STRENDA Standards for Reporting Enzymology Data
TBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/
FuGE Functional Genomics Experimenthttp://www.mibbi.org/index.php/MIBBI_portal
Minimum
Information for
Biological and
Biomedical
Investigations
Metadata Minefield
14. Blue Collar Science
John Quackenbush
Difficult
and time
consuming
Poor Credit
or Reward
Shabby
Career
Paths &
Prospects
15. 3. Credit Crisis
• Reward sharing, curation and
reuse rather than reinvention.
• Credit. Attribution. Citation.
• For software, methods and
standards too.
• Technical (DataCite.org).
• Cultural (Respected policy).
• Institutional.
• Funding bodies.
16. 4. Infrastructure, Capability & Capacity
• Three year
PhD/project cycle
• Local data control
• Realistic paths to
adoption by busy
people.
• Spreadsheets, wikis,
catalogues and
yellow pages.
• Content and Tools
18. 6. Sustained Resources
• Three year projects.
• Three year lifespan of data (and its software).
• Sunsets and Sustains
• Reinvention rewarded
• Institution.
• Funding councils.
• Funding panels.
• Publishers
• Libraries
• National data centres
• International data centres
20. A Partnership
• Software engineers
• Computational scientists
• Experimental Scientists
• Domain informaticians
• Service providers
• Funding agencies
• But the community
credit crisis continues….
21. Summary
• Science is a complex social activity
undertaken by tribes of people and
dominated by trust issues.
• Infrastructure has to be there and fit for
purpose but its not the real the problem.
• Need a cultural shift (on all sides) that
truly honours data.
Notes de l'éditeur
Learn about JISC’s work in the area of shared services for STEM subjects, particularly the JANET network service and virtual research environments (i.e., web tools for helping research processes)
Explore new opportunities for research being opened up via shared services, and also the economic savings this creates
Consider the role their university might play in providing a shared service to other institutions
Nor major data centres but long tail
Data pipeline
Data funnel
Fuzzy line between collaborators and competitors
Usb drives, wikis, databadsaes,
Disributed in email etc.
Sharing without fear
MaDaM project
Competitive advantage.
Academic vanity.
Adoption.
Reputation.
Acceleration.
Novel insights.
Help.
Scrutiny.
Being scooped.
Misinterpretation.
Reputation.
Trust.
Not comprehensible
Competitive advantage.
Academic vanity.
Reputation.
Adoption
Scrutiny.
Being scooped.
Misinterpretation.
New Reward Schemes
But we have to aware of the drivers for collaboration.
Competitive advantage.
Be the first with the Nature paper.
Academic vanity
Credit, credibility, fame, acclaim,
recognition, peer respect, reputation.
Adoption
Get my stuff adopted / recognised
More funding
Being found out
Open to rigorous inspection.
Being scooped
Beaten by lab X
Protecting my turf.
Releasing results too early.
Getting left behind. Being out of fashion.
Looking stupid
Being misinterpreted or misrepresented.
Looking stupid. Losing control. Taking a risk
Some excuses
Genomics Standards Consortium
http://gensc.org/gc_wiki/index.php/MIBBI_workshop
All or nothing
Credit, Citation, Career
Personal and institutional visibility
Scholarly citation metrics
contribute, curate, review, reuse.
Data is not respected
. John Quackenbush - John Quackenbush - Professor of Computational Biology and Bioinformatics - Department of Biostatistics - Harvard School of Public Health.
58% developed by students, 24% stated not maintained
(Schultheiss et al. (2010) PLoS Comp Biol (in review))
Tools, commons
Preparing data for sharing is free like puppies are free
National Centre for BioOntologies
The Open Biological and Biomedical Ontologies
Standardise messages not structures
Only as good as your data services
Minimum models and Controlled vocabularies
63%
47%
58% developed by students, 24% stated not maintained
(Schultheiss et al. (2010) PLoS Comp Biol (in review))
Tools, commons
Preparing data for sharing is free like puppies are free
Doi’s cost
Hard core are the PALs
Commons-based Cleanup
● Manual and automated curation workflows ● Curators emergent and assigned ● Curation tools
Incentives
Right time right place – also email!
Third party curation is really hard
Expert curation
Classification
Weeding
Added value
Structured metadata
Prompting
Classification
Filtering
Facetted browsing
Time to get organised
One example workflow can be found at: http://www.myexperiment.org/workflows/16 This the the old example workflow, but I have tagged as a benchmark. You can see the breakdown of tags given to this at: http://www.myexperiment.org/workflows/16/curation ... or by clicking on the breakdown section (see attached image). 14 curation tags Some are slightly ambiguous and others have little meaning These were: * test workflow * component - part of whole solution * whole solution * tutorial / example * incomplete * junk * obsolete - deprecated * runnable * not runnable * requires description * requires credit / attribution * requires example input data * description; [Description Text] * example data; [port : value] Each tag was preceeded by a "c:" so that it would be picked up by the myExperiment plugin and could be differentiated from other myExperiment tags. If some example data was known, I tried to add it to using the example tag "example data; [port : value]", where the port name is given, along with the data to be put into the port. The whole process was very time consuming, as I had to try and open each workflow in T2, run it using some example data (or figure out what it did and run it with lots of test data), and then add each comment (checking each workflow on myExperiment to see if it had complete properly.
Add url here
E-Lab and Taverna – all my software - elephants ---- elephant in the room, blind men and elephants, danger of being white elephants?
SysMO
And other e-Science projects
Each of these apply to all our projects. Just one of them is not enough. Not even for Taverna.
To sustain it as a service we must sustain the software and the content in its repositories